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BACKGROUND OF THE INVENTION 



This application claims priority from provisional applications numbered 60/265,402 filed 
January 30, 2001 and 60/301,609 filed June 27, 2001. 

1. FIELD OF THE INVENTION 

The present invention relates to the implementation of lossless and near-lossless source 
coding for multiple access networks. 

2. BACKGROUND ART 
Source coding 

Source coding, also known as data compression, treats the problem of efficiently 
representing information for data transmission or storage. 

Data compression has a wide variety of applications. In the area of data transmission, 
compression is used to reduce the amount of data transferred between the sources and the 
destinations. The reduction in data transmitted decreases the time needed for transmission and 
increases the overall amount of data that can be sent. For example, fax machines and modems 
all use compression algorithms so that we can transmit data many times faster than otherwise 
possible. The Internet uses many compression schemes for fast transmission; the images and 
videos we download from some bulletin boards are usually in a compressed format. 
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In the area of data storage, data compression allows us to store more information on our 
limited storage space by efficiently representing the data. For example, digital cameras use 
image compression schemes to store more photos on their memory cards, DVDs use video and 
audio compression schemes to store movies on portable disks, we could also utilize text 
compression schemes to reduce the size of text files on computer hard disks. 

In many electronic and computer applications, data is represented by a stream of binary 
digits called bits (e.g., 0 and 1). Here is an example overview of the steps involved in 
compressing data for transmission. The compression begins with the data itself at the sender. 
An encoder encodes the data into a stream with a smaller number of bits. For example, an image 
file to be sent across a computer network may originally be represented by 40,000 bits. After the 
encoding the number of bits is reduced to 10,000. In the next step, the encoded data is sent to the 
destination where a decoder decodes the data. In the example, the 10,000 bits are received and 
decoded to give a reconstructed image. The reconstructed image may be identical to or different 
from the original image. 

Here is another example of the steps involved in compressing data for storage. In making 
MP3 audio files, people use special audio compression schemes to compress the music and store 
them on the compact discs or on the memory of MP3 players. For example, 700 minutes of MP3 
music could be stored on a 650MB CD that normally stores 74 minutes of music without MP3 
compression. To listen to the music, we use MP3 players or MP3 software to decode the 
compressed music files, and get the reconstructed music that usually has worse quality than the 
original music. 
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When transmitting digital data from one part of a computer network to another, it is often 
useful to compress the data to make the transmission faster. In certain networks, known as 
multiple access networks, current compression schemes have limitations. The issues associated 
with such systems can be understood by a review of data transmission, compression schemes, 
and multiple access networks. 

Lossless and Lossy Compression 

There are two types of compression, lossless and lossy. Lossless compression techniques 
involve no loss of information. The original data can be recovered exactly from the losslessly 
compressed data. For example, text compression usually requires the reconstruction to be 
identical to the original text, since very small differences may result in very different meanings. 
Similarly, computer files, medical images, bank records, military data, etc., all need lossless 
compression. 

Lossy compression techniques involve some loss of information. If data have been 
compressed using lossy compression, the original data cannot be recovered exactly from the 
compressed data. Lossy compression is used where some sacrifice in reconstruction fidelity is 
acceptable in light of the higher compression ratios of lossy codes. For example, in transmitting 
or storing video, exact recovery of the video data is not necessary. Depending on the required 
quality of the reconstructed video, various amounts of information loss are acceptable. Lossy 
compression is widely used in Internet browsing, video, image and speech transmission or 
storage, personal communications, etc. 
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One way to measure the performance of a compression algorithm is to measure the rate 
(average length) required to represent a single sample, i.e. R = , where l(x) is the 

length of the codeword for symbol x, P(x) is the probability of*. Another way is to measure the 
distortion, i.e., the average difference between the original data and the reconstruction. 

Fixed-length Code 

A fixed-length code uses the same number of bits to represent each symbol in the 
alphabet. For example, ASCII code is a fixed-length code: it uses 7 bits to represent each letter. 
The codeword for letter a is 100001 1, that for letter A is 1000001, etc. 

Variable-length Code 

A variable-length code does not require that all codewords have the same length, thus we 
may use different number of bits to represent different symbols. For example, we may use 
shorter codewords for more frequent symbols, and longer codewords for less frequent symbols; 
thus on average we could use fewer bits per symbol. Morse code is an example of a variable- 
length code for the English alphabet. It uses a single dot (•) to represent the most frequent letter 
E, and four symbols: dash, dash, dot, dash (--*-) to represent the much less frequent letter Q. 
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Non-singular, Uniquely decodable. Instantaneous, Prefix-free Code 



Table 1. Classes of Codes 



Symbols 


P(X) 


Singular 


Non- singular, but not 


Uniquely decodable, 


Instantaneous 


1 


0.45 


0 


1 


1 


1 


2 


0.25 


0 


10 


10 


01 


3 


0.1 


1 


0 


100 


001 


4 


0.2 


10 


110 


000 


000 



A non-singular code assigns a distinct codeword to each symbol in the alphabet. A non- 
singular code provides us with an unambiguous description of each single symbol. However, if 
we wish to send a sequence of symbols, a non-singular code does not promise an unambiguous 
description. For the example given in Table 1, the first code assigns identical codewords to both 
symbol T and symbol c 2\ and thus is a singular code. The second code is a non-singular code, 
however, the binary description of the sequence ' 12' is c 1 10', which is the same as the binary 
description of sequence '1 13' and that of symbol '4'. Thus we cannot uniquely decode those 
sequences of symbols. 

We define uniquely decodable codes as follows. A uniquely decodable code is one where 
no two sequences of symbols have the same binary description. That is to say, any encoded 
sequence in a uniquely decodable code has only one possible source sequence producing it. 
However, one may need to look at the entire encoded bit string before determining even the first 
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symbol from the corresponding source sequence. The third code in Table 1 is an example of a 
uniquely decodable code for the source alphabet. On receiving encoded bit 'F, one cannot 
determine which of the three symbols 'F, '2', fi 3' is transmitted until future bits are received. 

Instantaneous code is one that can be decoded without referring to future codewords. The 
5 third code is not instantaneous since the binary description of symbol c F is the prefix of the 
binary description of symbols '2' and '3\ and the description of symbol '2' is also the prefix of 
the description of symbol c 3\ We call a code a prefix code if no codeword is a prefix of any 
9 other codewords. A prefix code is always an instantaneous code; since the end of a codeword is 
Si always immediately recognizable, it can separate the codewords without looking at future 
fS encoded symbols. An instantaneous code is also a prefix code, except for the case of multiple 
s access source code where instantaneous code does not need to be prefix free (we will talk about 
N 5 this later). The fourth code in Table 1 gives an example of an instantaneous code that has the 

y prefix free property. 

i y 

The nesting of these definitions is: the set of instantaneous codes is a subset of the set of 
1 5 uniquely decodable codes, which is a subset of the set of non-singular codes. 

Tree Representation 

We can always construct a binary tree to represent a binary code. We draw a tree that 
starts from a single node (the root) and has a maximum of two branches at each node. The two 
branches correspond to '0' and 'V respectively. (Here, we adopt the convention that the left 
20 branch corresponds to '0' and the right branch corresponds to C F.) The binary trees for the 
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second to the fourth code in Table 1 are shown in trees 100, 101 and 102 of Figure 1 
respectively. 

The codeword of a symbol can be obtained by traversing from the root of the tree to the 
node representing that symbol. Each branch on the path contributes a bit ('0' from each left 
branch and ' 1 ' from each right branch) to the codeword. In a prefix code, the codewords always 
reside at the leaves of the tree. In a non-prefix code, some codewords will reside at the internal 
nodes of the tree. 

For prefix codes, the decoding process is made easier with the help of the tree 
representation. The decoder starts from the root of the tree. Upon receiving an encoded bit, the 
decoder chooses the left branch if the bit is '0' or the right branch if the bit is ' 1\ This process 
continues until the decoder reaches a tree node representing a codeword. If the code is a prefix 
code, the decoder can then immediately determine the corresponding symbol. 

Block Code 

In the example given in Table 1, each single symbol (' 1 '2', '3 '4') is assigned a 
codeword. We can also group the symbols into blocks of length n, treat each block as a super 
symbol in the extended alphabet, and assign each super symbol a codeword. This code is called a 
block code with block length n (or coding dimension ri). Table 2 below gives an example of a 
block code with block length n=2 for the source alphabet given in Table 1. 
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Table 2 



O 

o 
m 
m 
w 
w 
« 

5 

O 

u 

Q 
□ 
fij 



Block of Symbols 


Probability 


Code 


11 


0.2025 . 


00 


12 


0.1125 


010 


13 


0.045 


10010 


14 


0.09 


1000 


21 


0.1125 


111 


22 


0.0625 


1101 


23 


0.025 


11001 


24 


0.05 


0111 


31 


0.045 


10110 


32 


0.025 


101110 


33 


0.01 


110001 


34 


0.02 


110000 


41 


0.09 


1010 


42 


0.05 


0110 


43 


0.02 


101111 


44 


0.04 


10011 
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Huffinan Code 



A Huffinan code is the optimal (shortest average length) prefix code for a given 
distribution. It is widely used in many compression schemes. The Huffman procedure is based on 
the following two observations for optimal prefix codes. In an optimal prefix code: 

1 . Symbols with higher probabilities have codewords no longer than symbols with 
lower probabilities. 

2. The two longest codewords have the same length and differ only in the last bit; 
they correspond to the two least probable symbols. 

Thus the two leaves corresponding to the two least probable symbols are offsprings of the same 
node. 

The Huffman code design proceeds as follows. First, we sort the symbols in the alphabet 
according to their probabilities. Next we connect the two least probable symbols in the alphabet 
to a single node. This new node (representing a new symbol) and all the other symbols except for 
the two least probable symbols in the original alphabet form a reduced alphabet; the probability 
of the new symbol is the sum of the probabilities of its offsprings (i.e. the two least probable 
symbols). Then we sort the nodes according to their probabilities in the reduced alphabet and 
apply the same rule to generate a parent node for the two least probable symbols in the reduced 
alphabet. This process continues until we get a single node (i.e. the root). The codeword of a 
symbol can be obtained by traversing from the root of the tree to the leaf representing that 
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symbol. Each branch on the path contributes a bit ( c 0' from each left branch and ' 1 5 from each 
right branch) to the codeword. 

The fourth code in Table 1 is a Huffman code for the example alphabet. The procedure of 
how we build it is shown in Figure 2A. 

Entropy Code 

The entropy of source X is defined as: H(X) = -^ x /?(x)Iog p(x) . Given a probability 
model, the entropy is the lowest rate at which the source can be losslessly compressed. 

The rate R of the Huffman code for source X is bounded below by the entropy H(X) of 
source Xand bounded above by the entropy plus one bit, i.e., H(X)<R< H(X)+L Consider data 
sequence X n = (X l9 X 29 X 3 , ,X n ) where each element of the sequence is independently and 

identically generated. If we code sequence X n using Huffman code, the resulting rate (average 

HiX^i H(X} + 1 

length per symbol) satisfies: — < R < — — — . Thus when the block length (or coding 

n n 

dimension) n is arbitrarily large, the achievable rate is arbitrarily close to the entropy H(X). We 
call this kind of code 'entropy code', i.e., code whose rate is arbitrarily close to the entropy when 
coding dimension is arbitrarily large. 

Arithmetic Code 
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Arithmetic code is another, increasingly popular, entropy code that is used widely in 
many compression schemes. For example, it is used in the compression standard JPEG-2001. 

We can achieve efficient coding by using long blocks of source symbols. For example, 
for the alphabet given in Table 1, its Huffman code rate is 1 .85 bits per symbol. Table 2 gives an 
5 example of a Huffman code for the corresponding extended alphabet with block length two; the 
resulting rate is 1.8375 bits per symbol showing performance improvement. However, Huffman 
coding is not a good choice for coding long blocks of symbols, since in order to assign codeword 
S for a particular sequence with length n 9 it requires calculating the probabilities of all sequences 
m with length n, and constructing the complete Huffman coding tree (equivalent of assigning 
10 codewords to all sequences with length ri). Arithmetic coding is a better scheme for block 
s coding; it assigns codeword to a particular sequence with length n without having to generate 
H codewords for all sequences with length n. Thus it is a low complexity , high dimensional coding 
% scheme. 

In arithmetic coding, a unique identifier is generated for each source sequence. This 
1 5 identifier is then assigned a unique binary code. In particular, data sequence X n is represented 
by an interval of the [0,1) line. We describe X n by describing the mid-point of the corresponding 
interval to sufficient accuracy to avoid confusion with neighboring intervals. This mid-point is 
the identifier for X n . We find the interval for x* recursively, by first breaking [0,1) into 
intervals corresponding to all possible values of x x , then breaking the interval for the 
20 observed X L into subintervals corresponding to all possible values of X { x 2 , and so on. Given the 

11 
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interval A c [o,l] for X k for some 0<k<n (the interval for X°is [0,1)), the subintervals for 
{X k x k+l } are ordered subintervals of A with lengths proportional to p(x k+l ) . 

For the alphabet given in Table 1, Figure 2B shows how to determine the interval for 
sequence '132'. Once the interval [0.3352, 0.3465] is determined for '132', we can use binary 
code to describe the mid-point 0.34085 to sufficient accuracy as the binary representation for 
sequence '132'. 

In arithmetic coding, the description length of data sequence x n is 
\(x n ) - |~ - log p x (x n )~| + 1 where p x (x n ) is the probability of x n ; this ensures the interval 

corresponding to different codewords are disjoint and the code is prefix free. Thus the average 
rate per symbol for arithmetic code is 

R = lfn^ x p x 0f)Kjf) = l/n Zx^( x *)(r~ log Rate R is then bounded as: 

EVQ. <r< H ( x ) + 2 ^ w hich shows R is arbitrarily close to the source entropy when coding 
n n 

dimension n is arbitrarily large. 

Multiple Access Networks 

A multiple access network is a system with several transmitters sending information to a 
single receiver. One example of a multiple access system is a sensor network, where a collection 
of separately located sensors sends correlated information to a central processing unit. Multiple 
access source codes (MASCs) yield efficient data representation for multiple access systems 
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when cooperation among the transmitters is not possible. An MASC can also be used in data 
storage systems, for example, archive storage systems where information stored at different 
times is independently encoded but all information can be decoded together if this yields greater 
efficiency. 

In the MASC configuration (also known as the Slepian- Wolf configuration) depicted in 
Figure 3 A, two correlated information sequences {x^ and are drawn i.i.d. 
(independently and identically distributed) according to joint probability mass function (p.m.f.) 
p(x,y). The encoder for each source operates without knowledge of the other source. The decoder 
receives the encoded bit streams from both sources. The rate region for this configuration is 
plotted in Figure 3B. This region describes the rates achievable in this scenario for sufficiently 
large coding dimension and decoding error probability P e (w) approaching zero as the coding 
dimension grows. Making these ideas applicable in practical network communications scenarios 
requires MASC design algorithms for finite dimensions. We consider two coding scenarios: first, 
we consider lossless (P e (n) = 0) MASC design for applications where perfect data reconstruction 
is required; second, we consider near-lossless (P e ( " } is small but non-zero) code design for use in 
lossy MASCs. 

The interest in near-lossless MASCs is inspired by the discontinuity in the achievable rate 
region associated with going from near-lossless to truly lossless coding. For example, if p(x f y)>0 
for all (x,y) pairs in the product alphabet, then the optimal instantaneous lossless MASC achieves 
rates bounded below by H(X) and H(Y) in its descriptions of X and F, giving a total rate bounded 
below by H(X)+H(Y). In contrast, the rate of a near-lossless MASC is bounded below by H(X, Y), 
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which may be much smaller than H(X)+H(Y). This example demonstrates that the move from 
lossless coding to near-lossless coding can give very large rate benefits. While nonzero error 
probabilities are unacceptable for some applications, they are acceptable on their own for some 
applications and within lossy MASCs in general (assuming a suitably small error probability). In 
5 lossy MASCs, a small increase in the error probability increases the code's expected distortion 
without causing catastrophic failure. 

. 5 MASC versus Traditional Compression 

m To compress the data used in a multiple access network using conventional methods, 

W 

hi people do independent coding on the sources, i.e., the two sources X and Y are independently 

tO encoded by the two senders and independently decoded at the receiver. This approach is 

convenient, since it allows for direct application of traditional compression techniques to a wide 

z .. v. 

w 

y variety of multiple access system applications. However, this approach is inherently flawed 
because it disregards the correlation between the two sources. 

MASC on the contrary, takes advantage of the correlation among the sources; it uses 
1 5 independent encoding and joint decoding for the sources. (Joint encoding is prohibited because 
of the isolated locations of the source encoders or some other reasons.) 

For lossless coding, the rates achieved by the traditional approach (independent encoding 
and decoding) are bounded below by H(X) and H(Y) for the two sources respectively, i.e. 
R x > H(X) , and R x + R Y > H(X) + H(Y) . The rates achieved by MASC are bounded as 
20 follows: R x > H(X \Y),R Y > H(Y | X) and R X +R Y > H(X,Y) . When X and Fare correlated, 
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H(X) > H(X | Y) , H(Y) > H(Y \ X) and H (X) + H(Y) > H(X 9 Y) . Thus, MASCs can 
generally achieve better performance than the traditional independent coding approach. 

Prior Attempts 

A number of prior art attempts have been made to provide optimal codes for multiple 
5 access networks. Examples including H. S. Witsenhausen. "The Zero-Error Side Information 
Problem And Chromatic Numbers." IEEE Transactions on Information Theory, 22:592-593, 
n 1976; A. Kh. Al Jabri and S. Al-Issa. "Zero-Error Codes For Correlated Information Sources". In 

SSK? 

.S5s:.\ 

m Proceedings of Cryptography, pages 17-22, Cirencester,UK, December 1997; 

~? = 

Ul S. S. Pradhan and K. Ramchandran. "Distributed Source Coding Using Syndromes (DISCUS) 

\ 2 " 

W Design And Construction". In Proceedings of the Data Compression Conference, pages 158- 

P 167, Snowbird, UT, March 1999. IEEE; and, Y. Yan and T. Berger. "On Instantaneous Codes 

S For Zero-Error Coding Of Two Correlated Sources". In Proceedings of the IEEE International 

fU Symposium on Information Theory, page 344, Sorrento, Italy, June 2000. IEEE. 

15 Witsenhausen, Al Jabri, and Yan treat the problem as a side information problem, where 

both encoder and decoder know X, and the goal is to describe Y using the smallest average rate 
possible while maintaining the unique decodability of Y given the known value of X. Neither 
Witsenhausen nor Al Jabri is optimal in this scenario, as shown in Yan. Yan and Berger find a 
necessary and sufficient condition for the existence of a lossless instantaneous code with a given 

20 set of codeword lengths for Y when the alphabet size of X is two. Unfortunately their approach 
fails to yield a necessary and sufficient condition for the existence of a lossless instantaneous 
code when the alphabet size for X is greater than two. Prandhan and Ramchandran tackle the 
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lossless MASC code design problem when source Fis guaranteed to be at most a prescribed 
Hamming distance from source X. Methods for extending this approach to design good codes for 
more general p.m.f.s p(x,y) are unknown. 
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SUMMARY OF THE INVENTION 



Embodiments of the invention present implementations for multiple access source coding 
(MASC). The invention provides a solution for independently encoding individual sources and 
for decoding multiple source data points from the individually encoded streams in a single 
decoder. In a two source example, the invention provides a way to separately encode samples 
from data source x and date source y - using no collaboration between the encoders and 
requiring no knowledge of y by the encoder of x or vice versa - and a way to decode data pairs 
(x, y) using the individual encoded data streams for both x and y. 

Embodiments of the present invention disclosed herein include algorithms for: 

1. optimal lossless coding in multiple access networks (the extension of Huffman 
coding to MASCs); 

2. low complexity, high dimension lossless coding in multiple access networks (the 
extension of arithmetic coding to MASCs); 

3. optimal near-lossless coding in multiple access networks (the extension of the 
Huffman MASC algorithm for an arbitrary non-zero probability of error); 

4. low complexity, high dimensional near-lossless coding in multiple access 
networks (the extension of the arithmetic MASC algorithm for an arbitrary non- 
zero probability of error). 
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The algorithmic description includes methods for encoding, decoding, and code design 
for an arbitrary p.m.f. p(x,y) in each of the above four scenarios. 

Other embodiments of the present invention are codes that give (a) identical descriptions 
and/or (b) descriptions that violate the prefix condition to some symbols. Nonetheless, the codes 
described herein guarantee unique decodability in lossless codes or near lossless codes with 
P e < e (e fixed at code design in "near-lossless" codes). Unlike prior art which only discusses 
properties (a) and (b), the present invention gives codes that yield both types of descriptions. 
The present invention also gives definition of the class of algorithmns that can be used to 
generate the codes with properties (a) and (b). 

One embodiment of the present invention provides a solution that partitions the source 
code into optimal partitions and then finds a matched code that is optimal for the given partition, 
in accordance to the aforementioned definition of the class of algorithmns. In one embodiment 
the source alphabet is examined to find combinable symbols and to create subsets of combinable 
symbols. These subsets are then partitioned into optimal groups and joined in a list. The 
successful groups from the list are then used to create complete and non-overlapping partitions of 
the alphabet. For each complete and non-overlapping partition, an optimal matched code is 
generated. The partition whose matched code provides the best rate is selected. In one 
embodiment, the matched code can be a Huffman code, an arithmetic code or any other existing 
form of lossless code. 

Embodiments of the present invention can be used to provide lossless and near-lossless 
compression for a general compression solution for environments where multiple encoders 
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encode information to be decoded by a single decoder or for environments where one or more 
encoders encode information to be decoded by a single decoder to which side information is 
available. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

These and other features, aspects and advantages of the present invention will become 
better understood with regard to the following description, appended claims and accompanying 
drawings where: 

Figure 1 shows the binary trees for the second to the fourth code in Table 1. 

Figure 2A illustrates an example Huffman code building process. 

Figure 2B illustrates an example sequence determination process for Arithmetic coding. 

Figure 3A shows an example MASC configuration. 

Figure 3B shows the achievable rate region of multiple access source coding according to 
the work of Slepian-Wolf. 

Figure 4 is a flow diagram of an embodiment of the present invention. 

Figure 5 is a flow diagram of an embodiment of finding combinable symbols of the 
present invention. 

Figure 6 is a flow diagram of an embodiment for building a list of groups. 

Figure 7 is a flow diagram for constructing optimal partitions. 

20 
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Figure 8 is flow diagram of an embodiment for constructing a partition tree and labeling 
of each node within the tree. 

Figure 9 is a block diagram of a side-information joint decoder embodiment of the 
invention. 

Figures 10A - 10D illustrate node labeling and coding using the present invention. 

Figure 1 1 is a flow diagram illustrating Huffman code word generation using the present 
invention. 

Figures 12 A - 12C illustrate arithmetic coding using the present invention. 

Figures 13 illustrates a flow chart for a general coding scheme for an alternate algorithm 
embodiment. 

Figure 14 show a comparison of three partition tress generated from the various 
embodiments of the present invention. 

Figure 15 is a graph of general lossless and near-lossless MASC results. 

Figure 16 is diagram showing how two groups are combined according to one 
embodiment of the invention. 
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Figure 17 is a flow diagram for generating matched code according to an embodiment of 
the present invention. 

Figure 18 is a flow diagram for building matched codes that approximate the optimal 
length function according to another embodiment of the present invention. 
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DETAILED DESCRIPTION OF THE INVENTION 



Embodiments of the present invention relate to the implementation of lossless and near- 
lossless source coding for multiple access networks. In the following description, numerous 
specific details are set forth to provide a more thorough description of embodiments of the 
invention. It will be apparent, however, to one skilled in the art, that embodiments of the 
invention may be practiced without these specific details. In other instances, well known 
features have not been described in detail so as not to obscure the invention. 

The invention provides a general data compression scheme for encoding and decoding of 
data from multiple sources that have been encoded independently. The invention can also be 
implemented in a side-information environment where one of the data sources is known to the 
decoder. Although the invention is a general solution for multiple data sources, the invention is 
described by an example of a two data source network. 

The present invention is described herein and by way of example to two data sources X 
and Y that provide data stream x u x 2) x^, ...x n and data stream yi,2/2?2/3 ? --Un respectively to 
dedicated encoders. The streams are provided to a single decoder that can produce decoded data 
pairs {x n ,y n ). Before describing embodiments of the invention, a summary of notations used in 
the example of the MASC problem is provided. 



LA 64474v6 



23 



Notations in MASC 

In describing the multiple access source coding (MASC) problem, we consider finite- 
alphabet memoryless data sources X and Y with joint probability mass function p{x,y) on 
alphabet X x y. We use vx{x) and py{y) to denote the marginals of p(x, y) with respect to X and 
Y. (The subscripts are dropped when they are obvious from the argument, giving px{x) = p(x) 
and py(y) = p{y)). A lossless instantaneous MASC for joint source (X,Y) consists of two encoders 
lx : X^ {0, 1}* and ^y: {0, 1}" and a decoder 7" 1 : {0, 1}* x {0, lY^Xx y. Here a first 
dedicated encoder 7x is encoding data source X which has alphabet X into strings of O's and l's 
(bits). A second dedicated encoder 7^ is doing the same for data source Y which has alphabet y . 
Then a single decoder 7 _1 recovers X and y from the encoded data streams. jx(x) and 7*{y) 
denote the binary descriptions of x and y and the probability of decoding error is 
P e = Pr(7 _1 (7x(X), 7y(y))^(X, 1)). P e is the probability of occurrence for the discrepancy between 
the decoded data and the original data. Here, we focus on instantaneous codes, where for any 
input sequences x 1 ,x 2 ,x 3 and y 19 y 2 ^y 3 with p(x l ,y l )>0 the instantaneous decoder 

reconstructs (x^y } ) by reading only the first |^(^)| bits from y x (x x )y x (x 2 )y x (x 3 ) and 

the first \y Y (y { )\ bits from y Y (y x )y Y (y ] 2 )y Y {y^) (without prior knowledge of these lengths). 

The present invention provides coding schemes for the extension of Huffman coding to 
MASCs (for optimal lossless coding and for near-lossless coding), the extension of arithmetic 
coding to MASCs (for low complexity, high dimension lossless coding and for near-lossless 
coding). The embodiments of the invention are described with respect to two environments, one, 
lossless side-information coding, where one of the data sources is known to the decoder, and 
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another environment, the general case, where neither of the sources must be independently 
decodable. 

To further describe this embodiment of the present invention, we begin by developing 
terminology for describing, for a particular code, which symbols from Y have binary 
5 descriptions that are identical and which have binary descriptions that are prefixes of each other. 
This embodiment of the present invention defines a "group" for which codes can be designed to 
describe its nested structure instead of designing codes for symbols. The invention also defines 
partitions, which are optimal for a particular coding scheme (Huffman coding or arithmetic 
O coding). Finally, the invention describes matched codes which satisfy particular properties for 
ffl partitions and coding schemes. The goal in code design in the present application is to find the 
2 code that minimizes XR X + (1 - X)R Y for an arbitrary value of X e [0, 1] . The result is codes with 
H intermediate values of R x and R Y . In some cases the goal is to design code that minimizes 
□ XR X + (1 - X)R Y with probability of error no greater than P e . 

Figure 4 is a flow diagram that describes one embodiment of the invention. At step 401 
15 the alphabet of symbols generated by the sources is obtained. These symbols are organized into 
combinable subsets of symbols at step 402. These subsets are such that there is no ambiguity 
between subsets as will be explained below. At step 403 the subsets are formed into optimal 
groups. These optimal groups are listed at step 404. The groups are used to find and define 
optimal partitions at step 405 that are complete and non-overlapping trees of symbols. The 
20 successful partitions are used to generate matched codes at step 406, using either arithmetic or 
Huffman codes. One skilled in the art will recognize that lossless codes other than Huffman and 
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arithmetic can be utilized as well. At step 407, the partition whose matched code has the best 
rate is selected and used for the MASC solution. 

Lossless Side-Information Coding 

One embodiment of the present invention presents an implementation for lossless side- 
5 information source coding. This problem is a special case of the general lossless MASC 

problem. (In a general MASC, the decoder has to decode both sources (i.e. X and Y) without 
y, knowing either one). By contrast, in the side-information application, one of data sources is 
Q known to the decoder. The goal is to find an optimal way to encode one of the data sources 
f m : given the other source is known. 

w 

1Q The invention will be described first in connection with a lossless, side-information 

j s i MASC solution. Later we describe other embodiments of the invention for a lossless general 
O MASC solution, and embodiments for near-lossless side-information and general MASC 
solutions. 

Figure 9 shows an example side-information multiple access network. Side-information 
15 X is perfectly known to the decoder 902 (or losslessly described using an independent code on 
X\ and the aim is to describe Y efficiently using an encoder 901 that does not know X. This 
scenario describes MASCs where y x encodes X using a traditional code for p.m.f. {v{ x )} x ^x 
and encoder y Y encodes Y assuming that the decoder decodes X before decoding Y. In this 
case, if the decoder 902 can correctly reconstruct y x by reading only the first l^yOO! bits of the 
20 description of the Y data stream Y Y (y x )Y Y (y 1 )Y Y (y z ) from encoder 901 (without prior 
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knowledge of these lengths), then the code y Y is a lossless instantaneous code for Y given Xor a 
lossless instantaneous side-information code. Note that the side-information as shown in the 
figure comes from an external source to decoder 902. This external source can come from a 
wide variety of places. For example it is possible that the decoder already has embedded side 
information within it. Another example is that the external source is a data stream from another 
encoder similar to encoder 901. 

A necessary and sufficient condition for y Y to be a lossless instantaneous code for Y 
given side information X is: for each x 6 X, y, %} G A implies that ir{y) and iv{y ! ) satisfy 
the prefix condition (that is, neither binary codeword is a prefix of the other codeword), where 
A x = {yey:p(x,y)>0}. 

It is important to note that instantaneous coding in a side-information MASC requires 
only that {^y{y) ' y E A x } be prefix-free for each x G ?C and not that {^(y{y) : y G 3^} be 
prefix-free, as would be required for instantaneous coding if no side-information were available 
to the decoder. This is because once the decoder knows X, it eliminates all y%A x (since y%A x 
implies p(X, y ! ) = 0). Since all codewords for y G Ax satisfy the prefix condition, the decoder 
can use its knowledge ofXto instantaneously decode Y. 

Thus the optimal code may violate the prefix condition either by giving identical 
descriptions to two symbols (having two y symbols be encoded by the same codeword: 
yfy) = ^j) for some y^Hf) or by giving one symbol a description that is a proper prefix of the 
description of some other symbols. We write j^y) ^ 7^?/) if the description of y is a prefix of 
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the description of ij where and7y(?/) ^ lv{y ! ) if^yiv) is ^proper prefix of 
7y(V) meaning we disallow the case of 7 y( 2/) = 7y(2/')- 



Invention Operation 



We will illustrate the operation of the present invention with the data set of Table 3. 
Table 1 gives a sample joint probability distribution for sources X and Y, with alphabets 

X=y={a Q ,a h ... } a r }. 



Table 3 



x/y 


a 0 


a, 


a 2 


<Z 3 


a. 


a 5 


a 6 


a 7 


a 0 


0.04 


0 


0.15 


0 


0 


0 


0 


0 


a, 


0 


0.04 


0 


0.05 


0.06 


0 


0 


0 


a 2 


0.04 


0 


0.05 


0 


0 


0 


0.01 


0 




0.02 


0 


0 


0.06 


0 


0.01 


0 


0 


#4 


0 


0.05 


0 


0 


0.05 


0.02 


0 


0 


a 5 


0 


0.1 


0 


0 


0 


0.03 


0.06 


0 




0 


0 


0 


0 


0 


0 


0.02 


0.05 


«7 


0 


0 


0 


0 


0 


0 


0.01 


0.08 
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Combinable Symbols 



At step 402 of Figure 4 we find combinable symbols and create subsets of these 
combinable symbols. Figure 5 is a flow diagram that describes the operation of finding 
combinable symbols and creating subsets of step 402. This example is directed to finding the 
combinable symbols of Y data. 

Symbols y i , y 2 £ ^ can be combined under p(x, y) if p(x, yi)p(x, y 2 ) = 0 for each 
x 6 X, At step 501 of Figure 5, a symbol y is obtained and at step 502 we find the set 
C y = {zey : z can be combined with y under p(x, y) } . Symbols in set Cy can be combined with 
symbol y but do not need to be combinable with each other. For example, the set Cy for a 0 is 
{ a x , a 4 , a 7 } (note that and a 4 need not be combinable with each other). 

In checking combinability, the first y symbol a 0 is examined and compared to symbols 
a x - Oj. a Q is combinable with a x because p(x, a 0 ) • p(x, a 2 ) = 0 Vx G . However, a 0 is 
not combinable with a 2 because p(x, a 0 ) • p(x, o 2 ) > 0 for x = ao ? x = a 2 . At step 503 it is 
determined if eachy symbol has been checked and a set C y has been generated. If not, the 
system returns to step 501 and repeats for the nexty symbol. If all y symbols have been checked 
at step 503, all of the sets C y have been generated. Using the example of Table 3, the generated 
sets C y for each symbol are shown below in Table 4. 
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Table 4 



0 


\ 5 4 ' 


1 






CL 


0 ? 2 > 


CL 






02 




a 4 , 




tin 


5 03 




a 7 






a. 


a 0 , a 2 , 




a 7 




<h 


a 2 , a 7 










a 2 , a 4 








a 7 


a 0 , a,, 


a 2 , 




a 4 , a 5 



X Continuing with Figure 5, at step 504 we find the nonempty subsets for each set C r For 

yj example, the non empty subsets for set C y of symbol a 0 are {a x }, {a 4 }, {a 7 }, {a x , a 4 }, {a l9 
a 7 } , { a 4 , a n } , and { a x , a A , a 7 } . At step 5 05 it is determined if each set Cy has been checked. 
H If not, the system checks the next set Cy at step 504. If all sets Cy have been checked, the process 

1 e i 

2 ends at step 506. 

Groups 

We call symbols y,lj "combinable" if there exists a lossless instantaneous 
side-information code in which Jriv) dt ly{y ! ) . If we wish to design a code with 
20 7 y ( y ) = 7 y( y ' ) , then we join those symbols together in a " 1 -level group." If we wish to 
give one 1 -level group a binary description that is a proper prefix of the binary description of 
other 1 -level groups, then we build a "2-level group." These ideas generalize to M-level groups 
with M>2. 
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Figure 6 is a flow diagram of the group generation 403 and list making steps 404 of 
Figure 4. At step 601 the nonempty subsets for a set C y generated by step 402 of Figure 4 are 
obtained. At step 602 the optimal partition is found for each nonempty subset. At step 603 a 
root is added to the optimal partition to create an optimal group. For example, for an optimal 
partition of a subset of the set Cy of a 0 , a 0 is added as the root of this optimal partition. This 
optimal group is added to a list £y at step 604. At step 605 it is determined if all sets have been 
checked. If not, the system returns to step 601 and gets the nonempty subsets of the next set. If 
so, the process ends at step 606. After the operation of the steps of Figure 6, we have a list, £y 
that contains optimal groups. 

The mathematical and algorithmical representations of the flow diagrams of Figures 4, 5, 
and 6 are presented here. Symbols 2/1,2/2 e ^ can be combined under p(ac, y) if 
p(x , yi)p(x , y 2 ) = 0 for each xePc. The collection G = (y h . . y m ) is called a 1 -level group for 
p(x, y) if each pair of distinct members y 3 eG can be combined under p(x, y). For any y 6 y 
and any p(x,y), (y) is a special case of a 1 -level group. The tree representation T[G) for 1 -level 
group Q is a single node representing all members of 9. 

A 2-level group for p(x, y) 9 denoted by G = (ft : C(72)) comprises a root ft and its children 
C(ft), where ft is a 1 -level group, C(ft) is a set of 1 -level groups, and for each G f € C (ft ) , 
each pair yi G ft and y 2 € 5 r can be combined under p(x, y) . Here members of all 
G 1 e C ( ft ) are called members of C(ft) ? and members of ft and C(ft) are called members of 
G. In the tree representation T[Q) for 5 , 7(ft) is the root of 7(g) and the parent of all subtrees 
T(G') fovG' e C(ft). 
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This ideas generalize to M-level groups. For each subsequent M> 2, an M-level group 
for p{x, y) is a pair 9 = (ft : C(JZ)) such that for each 9 e C(7l), each pair y x e n and y 2 € <7 can be 
combined under p(x, y). Here ft is a 1 -level group and C(fl) is a set of groups of M- 1 or fewer 
levels, at least one of which is an (M— l)-level group. The members of ft and C(ft) together 
5 comprise the members of Q = (ft : C(ft)) . Again, 2(ft) is the root of 7(0) and the parent of all 
subtrees T{9) for 9 € C(ft). For any M> 1, an M-level group is also called a multi-level group. 

We use the probability mass function (p.m.f.) in Table 1 , with X= y = {«o, o b . . ., ag, o 7 }, 
S to illustrate these concepts. For this p.m.f, (cto, a 4 , a?) is one example of a 1 -level group since 

E p(x, ao)p(x, a 4 ) = 0, p(x, oq)p(x, a 7 ) = 0 and p(x, a 4 >(x, 07) = 0 for all x € A. (This is seen in Table 2 

ffl as the entries for a 0 ). The pair (a 4 , 07) , a subset of (oq, a 4 , 07) , is a distinct 1 -level group for 
O P( X , v) • The tree representation for any 1 -level group is a single node. 

0 An example of a 2-level group for p(x, y) is ft = (K) : {(ao), a 7 ), K)}) . In this case the 

1 y 

root node ^ = (o 4 ) and C(£) = {(oq), (02, 07), (oe)} . The members of C(72) are {o 0 , 02, ae, ^r} ; the 
members of ft are {o 0 , 02, a 4 , 06,07}. Here ft is a 2-level group since symbol o 4 can be combined 
15 with each of 00,02,06,07, and (ao), (02,07), (og) are 1 -level groups under p.m.f. pfol/). The tree 
representation ^ft) is a 2-level tree. The tree root has three children, each of which is a single 
node. 

An example of a 3 -level group for p{x,y) is ft = ((o 7 ) : {(00), (01), ((02) : {(o 4 ), (o 5 )})». In 
7(ft), the root 7[Q 7 ) of the three-level group has three children: the first two children are nodes 
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T(Go) and T(Gi) ; the third child is a 2-level tree with root node T(a 2 ) and children 7[a A ) and 
T(a 5 ) . The tree representation T(g 3 ) is a 3-level tree. 

Optimal Groups 

The partition design procedure for groups is recursive, solving for optimal partitions on 
5 sub-alphabets in the solution of the optimal partition on y. For any alphabet y l Q y , the 

procedure begins by making a list C yf of all (single- or multi-level) groups that can appear in an 
0 optimal partition V(y ! ) of y for p(x,y), The list is initialized as C y , = {(y) : y G 3>'} . 
^ For each symbol y 6 ^ ' , we wish to add to the list all groups that have y as one member of 
hi the root, and some subset ofy as members. To do that, we find the set C y = {2; € y : z can 
iO be combined with y under p(x , y) } . For each non-empty subset SCC y such that £ y > does not 
f" yet contain a group with elements SU we find the optimal partition V(S) of <S for p(x , y) . 
p We construct a new multi-level group Q with elements 5U {y} by adding y to the empty root of 
T(V(S)) ifP(5) contains more than one group or to the root of the single group in V(S) 
otherwise. Notice that y can be the prefix of any symbol in S , Since y can be combined with all 
1 5 members of <SU {y}, y must reside at the root of the optimal partition of SU {y}; thus G is 

optimal not only among all groups in {G l : members of G 1 are SU {y} and y is at the root of G } 
but among all groups in { G f : members of G ! are <SU {y} } . Group 5 is added to the £ y > , and the 
process continues. 

After this is accomplished, the list of optimal groups (step 404 of Figure 4) has been 
20 accomplished. 
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Optimal Partitions Design 



After the list of optimal groups has been created, it is used to create optimal (complete 
and non-overlapping) partitions. (A more thorough partition definition will be introduced in a 
later section titled "Optimal Partition: Definition and Properties." ) Complete and non- 
5 overlapping means that all symbols are included but none are included more than once. 

Referring to Figure 7, the steps for accomplishing this are shown. At step 700 we initialize i 
equal to 1. At step 701 we initialize an empty partition V', j = i + L At step 702 we add the "i 
th" group from £y to V f . At step 703 we check to see if the yth group overlaps or is 

p combinable with existing groups in V 1 . If so, we increment y and return to step 703. If not, the 

m 

W yth group is added to V 1 at step 705. At step 706 we check to see if V f is complete. If not, 

h I 

*S increments at step 704 and return to step 703. If V is complete then see iff is the last group in 

j~: C yt at step 707. If so, make a list of successful partitions at step 708. If not, then increment i 

hi 

p and return to step 70 1 . 

in I 

The operations of Figure 7 are performed mathematically as follows. A partition VQ) on 
15 y for p.m.f. p(x,y) is a complete and non-overlapping set of groups. That is, 

VQ) = {ft , ft, . . . , Qm] satisfies \J^Qi = y and G 3 f]9 k - <j> for any tfk, where each ft e V(y) is a 
group for p(x,y) 9 and Q 3 U Q k and n Q k refer to the union and intersection respectively of the 
members of Gj and Gk. The tree representation of a partition is called a partition tree. The 
partition tree 2(^0 for partition VQ?) = {G u G 2 , . - £?m} is built as follows: first, construct the 
20 tree representation for each Gj\ then, link the root of all 7(ft) , i G {l, . . . , m} to a single node, 
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which is defined as the root r of T^PQ)). A partition tree is not necessarily a regular /c-ary tree; 
the number of children at each node depends on the specific multi-level group. 

After constructing the above list of groups, we recursively build the optimal partition of 
y for p(x, y). If any group 9 6 C yi contains all of the elements of y , then V(y f ) = {Q } is 

5 the optimal partition on y . Otherwise, the algorithm systematically builds a partition, adding 
one group at a time from £> y > to set V{y f ) until V(y f ) is a complete partition. For Q e C yf 
to be added to V{y) , it must satisfy: (1) £7 n <?' = 0 ; and (2) 6? , £' cannot be combined (see 

5 Theorem 4 for arithmetic or Theorem 5 for Huffman coding) for all Q ( € V(y f ) . 

f^: Figure 1 OA gives an example of a partition tree from the example of Table 1 . In this case 

ft) the partition V(y) = {(03, ag), G3}. This indicates that the root node has two children, one is a 1- 

i level group 7(ag, <%) and the other is a 3 -level group consisting of root node with children 

O 7(ao), 7(oi) and Tfa), T[oq) is the root for its children T[a A ) and 

: 

As a prelude to generating matched code for optimal partitions, the branches of a 
partition are labeled. We label the branches of a partition tree as follows. For any 1 -level group 

15 g at depth d in TO), let n describe the rf-step path from root r to node %$) in T{V{y)) . We 
refer to Q by describing this path. Thus 7(n) = T[Q). For notational simplicity, we sometimes 
substitute n for %p) when it is clear from the context that we are talking about the node rather 
than the 1 -level group at that node (e.g. n 6 T{V(y) ) ) rather than 7[n) E T{V{y)) . To make the 
path descriptions unique, we fix an order on the descendants of each node and number them from 

20 left to right. Thus n's children are labeled as nl, n2, . . . } nK(n), where n k is a vector created by 
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concatenating k to n and K(n) is the number of children descending from n. The labeled 
partition tree for Figure 10A appears in Figure 10B. 

The node probability q(n) of a 1 -level group n with n e T(V(y)) is the sum of the 
probabilities of that group's members. The subtree probability Q(n) of the 1 -level group at 
n G T(P(y)) is the sum of probabilities of n's members and descendants. In Figure 10B, 

g (23) = Vyia 2 ) and Q(23) = Vy{a 2 ) + Py(a 4 ) + Py(a 5 ). 
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Referring to Figure 10B, the root node is labeled "r" and the first level below, comprising 
a pair of children nodes, is numbered "1" and "2" from left to right as per the convention 
described above. For the children of the root at number "2", the concatenation convention and 
left to right convention results in the three children nodes being labeled "21", "22", and "23" 
respectively. Accordingly, the children at root "23" are labeled "231" and "232". 

Matched Code Generation 

After creating partitions, the present invention determines the optimal partitions by 
generating matched code for each partition. The partition whose matched code has the best rate 
(of compression) is the partition to use for the MASC solution. These steps are described in 
Figure 8. 

Referring to Figure 8, at step 801 a partition tree is constructed for each partition. (Note 
that this step is described above). At step 802 the order of descendants is fixed and numbered 
from left to right. At step 803, the node at each level is labeled with a concatenation vector. 
Thus n's children are labeled as nl, n2, , . .,nK(n), where vk is a vector created by concatenating k 
to n and K(n) is the number of children descending from n. The labeled partition tree for Figure 
10A appears in Figure 10B. At step 804 a matched code is generated for the partition. This 
matched code can be generated, for example, by Huffman coding or Arithmetic coding. 

A matched code for a partition is defined as follows. A matched code 7y for partition 
VQ) is a binary code such that for any node n e T(V(y)) and symbols y x , y 2 € n and y 3 € nfc, 
k e {1, . . .,K(n)}: (1) 7 yO/i) = 7K2/2); (2) 7^1) ~< tM; (3) {71W : * € {1, . . K(n)}} is 
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prefix-free. We here focus on codes with a binary channel alphabet {0,1}. The extension to 
codes with other finite channel alphabets is straight forward and the present invention is not 
limited to a binary channel alphabet. (We use 7 Y ( n ) interchangeably with 7 y( y ) for any 
y e n.) If symbol y e y belongs to 1 -level group G, then jy(y) describes the path in 
5 T(V(y)) from r to T( Q ) ; the path description is a concatenated list of step descriptions, 
where the step from n to nfc, & € {1, . . #(n)} is described using a prefix-code on {l, . . K(n)}. 
An example of a matched code for the partition of Figure 10A appears in Figure 10C, where the 
y ; codeword for each node is indicated in parentheses. Figure 17 shows how a matched code is 
□ generated according to one embodiment of the invention. In step 1701, the process begins at the 

W root of the tree. Then at 1702, the prefix code for each nodes' offsprings are designed. Finally at 

W 

W 1703 the ancestors' codewords are concatenated to form the resulting matched code. 

J* 3 In the above framework, a partition specifies the prefix and equivalence relationships in 

pj the binary descriptions of y € y . A matched code is any code with those properties. The above 

definitions enforce the condition that for any matched code, 2/1,2/2 e A for some x G X implies 
15 that 7 y( y 1 ) i 7 y( y 2 ) ; that is, 7 y violates the prefix property only when knowing X 

eliminates all possible ambiguity. 

Theorem 1 establishes the equivalence of matched codes and lossless side-information 

codes. 

Theorem 1 Code 7r is a lossless instantaneous side-information code for p(x,y) if and only if 7^ 
20 is a matched code for some partition V(y) for p(x, y) . 
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Proof: First we prove that a matched code for partition V(y) is a lossless instantaneous 
side-information code for Y. This proof follows from the definition of a matched code. In a 
matched code for partition V(y) , only symbols that can be combined can be assigned codewords 
that violate the prefix condition, thus only symbols that can be combined are indistinguishable 
5 using the matched code description. Since symbols y\ and 2/2 can be combined only if 
p(x } yi)p(x, y 2 ) = 0 for all x G X , then for each x G X, the matched code's codewords for 
A x = {y G y : p(ac, y) > 0} is prefix free. Thus the decoder can decode the value of X and then 
losslessly decode the value of Y using the instantaneous code on A> 

Next we prove that a lossless instantaneous side-information code 7^ must be a matched 
fd code for some partition on y for p(x, y). That is given 7r, it is always possible to find a 
s partition VQ) on y for p(x, y) 9 such that A/"= {7y(y) : y G 3^} describes a matched code for 7^. 

□ Begin by building a binary tree % corresponding to A/ as follows. Initialize % as a 

? y fixed-depth binary tree with depth max ma^lT^y)! . For each y G label the tree node reached 

by following path 7*{y) downward from the root of the tree (here '0' and T correspond to left and 
15 right branches respectively in the binary tree). Call a node in % empty if it does not represent 

any codeword in A/ and it is not the root of %\ all other nodes are non-empty. When it is clear 

from the context, the description of a codeword is used interchangeably with the description of 

the non-empty node representing it. 

Build partition tree T from binary tree % by removing all empty nodes except for the 
20 root as follows. First, prune from the tree all empty nodes that have no non-empty descendants. 
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Then, working from the leaves to the root, remove all empty nodes except for the root by 
attaching the children of each such node directly to the parent of that node. The root is left 
unchanged. In T: 

(1) All symbols that are represented by the same codeword in A/ reside at the same node of T. 
5 Since 7y is a lossless instantaneous side-information code, any 2/1,2/2 at the same node in T can 
be combined under p(x, y) . Hence each non-root node in T represents a 1 -level group. 

y, (2) The binary description of any internal node n E 1 is the prefix of the descriptions of its 

□ descendants. Thus for 7y to be prefix free on A for each x e X, it must be possible to combine 

f I n with any of its descendants to ensure lossless decoding. Thus n and its descendants form a 

fft multi-level group, whose root H is the 1 -level group represented by n. In this case, C(R) is the set 

o of (possibly multi-level) groups descending from n in T. 

p (3) The set of codewords descending from the same node satisfies the prefix condition. 

Thus 7 is a partition tree for some partition VQ) for p(x,y) and N is a matched code for □ 

Given an arbitrary partition VQ) for p(x,y) 9 we wish to design the optimal matched code 
15 for V(y). In traditional lossless coding, the optimal description lengths are P(y) = -logp(y) for 
all y G 3^ if those lengths are all integers. Theorem 2 gives the corresponding result for lossless 
side-information codes on a fixed partition ^(J 7 ). 

Theorem 2 Given partition 7^ for p(x,y), the optimal matched code for V(y) has description 
lengths Z^(r)=Oand 
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for all n 6 %VQ)) and k € {1, . . .,K{n)} if those lengths are all integers. Here ? (n) = I implies 



Cvi( y ) = ' f° r a ^ symbols y € ^ that are in 1 -level group n. 



Proof: For each internal node n € 2(7^)), the codewords {7r( nfc ) : fc € { 1 , . . . , if (n)}} share 
5 a common prefix and satisfy the prefix condition. Deleting the common prefix from each 
O codeword in {7y(n/c) : k — 1 , . . . , if (n)} yields a collection of codeword suffixes that also 

satisfy the prefix condition. Thus if ^(n) is the description length for n, then the collection of 
hj lengths {lp^(nk) - ^(n) : k = 1, . . if(n)} satisfies the Kraft Inequality: 

O Efjj 0 2-<W n *>-W n >> < L (Here ^(r)=0 by definition.) We wish to minimize 
W the expected length 



of the matched code over all ^(n) that satisfy 



^ 2"( W nfc )-W n » = 1 , Vn G X{T\yj) ={n € 7(7^) : K(n) > 0}. 



(We here neglect the integer constraint on code lengths.) If u(n) = 2 ^ (n) , then 
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TO)=n^) g(n)1 ° g ^)' 

and u(n) must satisfy 

E u(nk) = u(n) 9 Vn 61(7^)). 

fc=i 

Since ^C^P 7 )) is a convex function of u(n), the constrained minimization can be posed as an 
unconstrained minimization using the Lagrangian 

Differentiating with respect to u(n) and setting the derivative to 0, we get 
-g(n/c)/n(n/c)loge + A(n/c) - A(n) = 0, if nk is an internal node; 

< 

-q(nk) /u(nk) loge - A(n) =0. if nk is a leaf node. (1) 

First consider all nfc's at the lowest level of the tree that have the same parent n. We have 
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q(nk) /u{pk) loge = Q(nk) /u(nk) loge = - A(n) , k = 1, . . . , if(n) ; 



K ^u(nk) — u(n) 



(2) 



Thus we get 



u(nfe) = gfpk) ? } (ni) - g(nfc) ^ ^ 



Vfc = l,...,ir(n) 



giving 



A(n) = 



u(n) 



loge. 



(3) 



Mi Other nodes at the lowest level are processed in the same way. 



Now fix some ni two levels up from the tree bottom, and consider any node nift. 



Case 1: If r^k has children that are at the lowest level of the tree, then by (1), 



.«loge + A(n 1 /c)-A(n 1 ) = 0. 



(4) 



Substituting (3) into (4) gives 
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•^) lQ g e - ,(n lfc ) lo g e ~ A ( n l) = -^ lo § e - A( ni ) =0, 



(5) 



that is 



Q(nifc) 



loge = -A(ni) 



(6) 



Case 2: If n = \\k has no children, then by (1), 



3 



Cl which is the same as (6). 



Considering all such n^, k = 1, . . if(n!) we have 



Q^/^/^n^loge = — A(ni), k = 1, . . .,X(ni) 



E2^ l) W (n 1 fc)=u(n 1 ) 



(7) 



which is the same problem as (2) and is solved in the same manner. 



Continuing in this way (from the bottom to the top of T^TO)))^ we finally obtain 
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(8) 



Setting = -logu(nfe) completes the proof. □ 



Thus, Theorem 2 provides a method of calculating the optimal length function. We now 
present three strategies for building matched codes that approximate the optimal length function 
of Theorem 2. Figure 18 shows the process of building matched codes. At step 1801 the process 
begins at root. Then at 1802 one of three strategies is used (Shannon / Huffman / Arithmetic 
code) for code design for each node's immediate offsprings based on their normalized subtree 
probabilities. At 1803 the ancestors' codewords for each node are concatenated. 



s For any node n with K(n) > 0, the first matched code T^L^ describes the step from n to 

e , 

W nfc using a Shannon code with alphabet {1, . . K(tol)} and p.m.f. 
fit the resulting description lengths are ^^( r ) = 0 and 

l Hy) {nk) = C) (n) + ^dJfW/W)!. Codes 7^ and 7^, replace the 
Shannon codes of 7^L* with Huffman and arithmetic codes, respectively, matched to the same 
p.m.f.s. 



15 Matched Huffman Coding 

As an example, build the matched Huffman code for the partition in Figure 10A, working 
from the top to the bottom of the partition tree T . A flow diagram illustrating the steps of this 
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process is illustrated in Figure 1 1 . At step 1 101 we begin at the root node and we design a 
Huffman code on the set of nodes descending from 7 ? s root, according to their subtree 
probabilities, i.e. nodes {(03,06), (07)} withp.m.f. 

{^03) +pM,pfat) +pM +p y {a l ) +py{aa) +py(o 4 ) +pM} = {.21, .79}; a Huffinan code for 
5 these two branches is {0, l}. Referring to Figure 10C we see the calculated codes for the two 
nodes below the root node (given in parentheses) is 0 and 1 . 

At step 1 102, for each subsequent tree node n with K(n) > 0, consider 1 as a 

O new set, and do Huffinan code design on this set, with p.m.f. {Qi 1 ^)/ ^ Q( n i)} ' . We 
J| first design a Huffman code for group (07) f s children {(oq), (a x ), (02)} according to p.m.f 

jj r. » 

W {pfa)IQM<h)IQ>Pfa) +P^)/Q] = {.l/Q, .19/Q, .37/Q} where 

^ Q =^00) +p i {a 1 ) +py(a2) +py(a 4 ) +py(o 5 ) = .66; a Huffman code for this set of branches is 

y (00,01,1}. 

ir~ s 
5 

Then we design Huffman code {0, 1} for groups {(a 4 ), (o 5 )} with p.m.f. 
fe( a 4)/(py( a 4) +3^a 5 )),py(a5)/(py(a4) +Pi< a 5))} = {- 11 /- 17 ' -06/.17}. The full codeword for any 
1 5 node n is the concatenation of the codewords of all nodes traversed in moving from root T[v) to 
node n in 7 . The codewords for this example are shown in Figure 10C. 

Any "matched Huffman code'" 7^L* is shown to be optimal by Theorem 3. 

Theorem 3 Given a partition VQ), a matched Huffman code for VQ) achieves the optimal 
expected rate over all matched codes for VQ). 
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Proof: Let 1 be the partition tree of VQ) . The codelength of a node n e 7 is denoted by Z(n). 
The average length 7 for VQ) is 

1 - n 5 7 g(n)Z(n) (Q(k)l(k) + Ai(fc)) , 

where for each fc e {l, . . .,K(r)}, Ai(fc) = T,kne7^( kn )( l ( kn ) ~ K k ))< 

Note that Q WW and {AZ(fe)} can be minimized independently. Thus 

min I = min E Q( k ) l K k ) + e min AZ(fc) . 

In matched Huffman coding, working from the top to the bottom of the partition tree, we 

first minimize X^f Q(k)l(k) ove r all integer lengths l(k) by employing Huffman codes on Q(k). 

We then minimize each A/(fe) over all integer length codes by similarly breaking each down 
layer by layer and minimizing the expected length at each layer. □ 

Matched Arithmetic Coding 

In traditional arithmetic coding (with no side-information), the description length of data 
sequence ^ is I (y n ) = [- log py(y n )] + 1 , where py{y n ) is the probability of if 1 . In designing 
the matched arithmetic code of y n for a given partition VQ), we use the decoder's knowledge of 
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x n to decrease the description length of j/\ The following example, illustrated in Figures 12B - 
12C, demonstrates the techniques of matched arithmetic coding for the partition given in Figure 
10A. 

In traditional arithmetic coding as shown in Figure 12 A, data sequence Y 1 is represented 
5 by an interval of the [0, 1) line. We describe Y n by describing the mid-point of the 

corresponding interval to sufficient accuracy to avoid confusion with neighboring intervals. We 
find the interval for y 71 recursively, by first breaking [0, 1) into intervals corresponding to all 
M* possible values of yi, then breaking the interval for the observed Y\ into subintervals 
y corresponding to all possible values of Yiy> 2 , and so on. Given the interval A C [0, 1] for Y k for 
|Q some 0 < k < n (the interval for Y° is [0, 1)), the subintervals for { Y k y fc+ 1 } are ordered 
m subintervals of A with lengths proportional to p (jte+i) . 

I s 

W In matched arithmetic coding for partition VQ) as shown in Figure 12B, we again 

r: describe Y n by describing the mid-point of a recursively constructed subinterval of [0, 1). In 

this case, however, if Y\ G Hq at depth d(n 0 ) = g2 0 in ^(3^), we break [0, 1) into intervals 
1 5 corresponding to nodes in B = {n : (K(n) = 0 A d(n) < do) V (K(n) > 0 A d(n) = do)} . The 

interval for each n G B with parent n 0 has length proportional to 

,<*>(») = P <^(n 0 ) ( E M?Jg (ntt) ) =P W (no) (^kl) 
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(here p (A \n) is defined to equal 1 for the unique node r at depth 0). Refining the interval for 
sequence Y l ~ 1 to find the subinterval for Y l involves finding the 1 -level group n £ VQty such 
that Y% G n and using d(n) to calculate the appropriate values and break the current 
interval accordingly. We finally describe Y n by describing the center of its corresponding 
subinterval to an accuracy sufficient to distinguish it from its neighboring subintervals. To 
ensure unique decodability, 

^V)=r-w A) (^)i+i, 

where p^(y n ) is the length of the subinterval corresponding to string y n . Given a fixed 
partition ^(3^) , for each y £ y denote the node where symbol y G 3^ resides by n(y) , and 
let Ho(y) represent the parent of node y. Then 



E-IogpW(n( W )) 



i=l 



+ 1 



E(-logpI^(iM,( W ))-Io g - TO g 



i=l 



Q(n(y»)) \ 
Efir (w)) Q(no(yi)fc)>' 



+ i 



where 1*0 is the optimal length function specified in Theorem 2. Thus the description length 
l( A )(y n ) in coding data sequence y n using a 1 -dimensional "matched arithmetic code" ly^p^ 
satisfies {I / n)M\y n ) < (1/n) E" =1 l*(V») + 2 / n , giving a normalized description length 
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arbitrarily close to the optimum for n sufficiently large. We deal with floating point precision 
issues using the same techniques applied to traditional arithmetic codes. 



As an example, again consider the p.m.f of Table 3 and the partition of Figure 10A. If 
Y\ G {03,05,07}, [0, 1) is broken into subintervals [0, .21) for group {a%,a%) and [.21, 1) for group 
(07), since 



p m (K 0 6)) = p (a V)$^ = .2i 

p(A) ((o7))=p W (r) ^|_ = . 7 9. 

If Y x G {00,0!, 02}, [0, 1) is broken into subintervals [0, .21) for group (03,^), [-21, -33) for group 
(00), [.33,. 56) for group (oi), and [.56,1) for group (02) since 



P U«2;; - P y\ a7 ))Q((a 7 ))-q((a 7 )) " ' y .79-.13 



Finally, if >i € {04, 05}, [0, 1) is broken into subintervals [0, .21) for group (as, <%), [-21, -33) for 
group (ao), [-33, .56) for group {(h), [.56, .84) for group (04), and [.84, 1) for group (05) since 
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V 



(i) N) = ? (A) (N) 



Q((«4)) 



= .44(.ll/(.37 -.2)) = .2847 



<?((a 2 ))-?((a 2 )) 



P 



{A) ( (a 6 )) = P (A) ((a 2 )} 



= .44(.06/(.37 - .2)) = .1553. 



Q((a 2 ))- g ((a 2 )) 



Figure 12B shows these intervals. 

Figure 12C shows the recursive interval refinement procedure for Y 5 = (a-^a^a^a^) . 
Symbol Y 1 =a r gives interval [0, .21) of length .79 (indicated by the bold line). Symbol Y 2 = a 3 
refines the above interval to the interval [.21, .3759) of length .21 • .79 = .1659. Symbol Y z =a 4 
refines that interval to the interval [-3024, .3500) of length .28 • .1659 = .0472. This procedure 
continues until finally we find the interval [0.3241, 0.3289). 

Notice that the intervals of some symbols overlap in the matched arithmetic code. For 
example, the intervals associated with symbols 04 and 05 subdivide the interval associated with 
symbol 02 in the previous example. These overlapping intervals correspond to the situation 
where one symbol's description is the prefix of another symbol's description in matched Huffman 
coding. Again, for any legitimate partition the decoder can uniquely distinguish between 
symbols with overlapping intervals to correctly decode Y n using its side information about X n . 

Optimal Partitions: Definitions and Properties 

The above describes optimal Shannon, Huffman, and arithmetic codes for matched 
lossless side-information coding with a given partition TQ). The partition yielding the best 
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performance remains to be found. Here we describe finding optimal partitions for Huffman and 
arithmetic coding. 

Given a partition VQ), let and f* ^ be the Huffman and optimal description 

lengths respectively for *PQ). We say that VQ) is optimal for matched Huffman side-information 
coding on p(x, y) if El^ y) (Y) < E l ( ^ y) ( Y) for any other partition V '( y ) for y) (and 

therefore, by Theorems 1 and 3, ^1^^ (Y) < El(Y) where i is the description length for any 

other instantaneous lossless side-information code on p(x,y). We say that V(y) is optimal for 
matched arithmetic side-information coding on p(x,y) if 15 Z* (Y) < 15Z* (Y) for any 

other partition P ' ( y ) for p(x , J/) . 

Some properties of optimal partitions follow. Lemma 2 demonstrates that there is no loss 
of generality associated with restricting our attention to partitions V ( y ) for which the root is 
the only empty internal node. Lemma 3 shows that each subtree of an optimal partition tree is an 
optimal partition on the sub-alphabet it describes. Lemmas 2 and 3 hold under either of the 
above definitions of optimality. Lemma 4 implies that an optimal partition for matched Huffman 
coding is not necessarily optimal for arithmetic coding, as shown in Corollary 1. Properties 
specific to optimal partitions for Huffman coding or optimal partitions for arithmetic coding 
follow. 

Lemma 2 There exists an optimal partition V*(y) for p(x,y) for which every node except for 
the root of V*(y) is non-empty and no node has exactly one child. 
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Proof: If any non-root node n of partition V (y) is empty, then removing n, so {^}^f descend 

directly from n's parent, gives new partition V ' ( y ) . Any matched code on V(y) , including the 
optimal matched code on *PQ) 9 is a matched code on V ! (y) . If n has exactly one child, then 
combining n and its child yields a legitimate partition V\y) ; the optimal matched code for 
V 1 (y) yields expected rate no worse than that of the optimal matched code for V(y) . □ 

Lemma 3 If % . . n %n are the subtrees descending from any node n in optimal partition V*(y) 
for p(x, y) , then the tree where {^i, . . %}} descend from an empty root is identical to 
T(V*{y)) , where V*( y) is an optimal partition of J 7 = LT 2 % for p(x : y) . 

Proof: Since the matched code's description can be broken into a description of n followed by a 
matched code on {^1, . . and the corresponding description lengths add, the partition 
described by T(V(y)) cannot be optimal unless the partition described by {T h . . %^ is. □ 

Lemma 4 Let Pi and P2 denote two p.m.f.s for alphabet 3i and 3^2 respectively, and use H{p) and 
iP^ip) to denote the entropy and expected Huffman coding rate, respectively, for p.m.f. p. Then, 
if(pi) > H(p 2 ) does not imply R {H) (pi) > i? (i?) (p 2 ). 

Proof: The following example demonstrates this property. Let pi ={0.5,0.25,0.25}, 

P2 = {0.49, 0.49, 0.02} then Hfa) = 1.5, flfo) = 1.12. However, the rate of the Huffman tree for pi 

is 1 .5, while that for P2 is 1 :5 1 . □ 
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Corollary 1 The optimal partitions for matched Huffman side-information coding and matched 
arithmetic side-information coding are not necessarily identical. 

Proof: The following example demonstrates this property. Let alphabet 3^ = {&oAAAA}have 
marginal p.m.f. {0.49, 0.01, 0.25, 0.24, 0.01}, and suppose that V X Q) = {(bo, h\ (69J, fa h)} and 

= {(bo), fa h\ fa h)} are partitions of y for p{x, y). The node probabilities of 7\Q) and 
V 2 Q) are Pi ={0.5,0.25,0.25} and Pi -{0.49,0.49,0.02}, respectively. By the proof of Lemma 4, 
VlQ) is a better partition for Huffman coding while VzQ) is better for arithmetic coding. □ 

In the arguments that follow, we show that there exist pairs of groups (<?/, Gj) such that 
Qi H Gj = 0 ; but and £j cannot both descend from the root of an optimal partition. This 
result is derived by showing conditions under which there exists a group G* that combines the 
members of Gi and Gj and for which replacing {£//, Gj} with {<?*} in P(3 ; ) guarantees a 
performance improvement. 

The circumstances under which "combined" groups guarantee better performance than 
separate groups differ for arithmetic and Huffman codes. Theorems 4 and 5 treat the two cases in 
turn. The following definitions are needed to describe those results. 

We say that 1 -level groups G\ and G2 (or nodes %Gi) and T\Q<i)) can be combined 
under p(x, y) if each pair Vi G Gi, yi G G2 can be combined under p(x, y). 
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If Qh Gj £ ^0), so that Qi and Gj extend directly from the root r of ^PQ)) and nodes / 
and J are the roots of %Gi) and %Gj), and G 0 denotes the 1 -level group at some node *k> in 
%f3j), we say that Gj can be combined with Gj at n<> if (1) I can be combined with rio and each 
of iVs descendants in 1{Gj) and (2) «o and each of iVs ancestors in T{Gj) can be combined with 

5 I and each of /'s descendants in %Gi) . The result of combining G\ with Gj at G 0 is a new group 
<7. Group G k modifies Gj by replacing G D with 1 -level group (/, G 0 ) and adding the descendants 
of / (in addition to the descendants of G Q ) as descendants of (/,&) in T(G*) . Figure 10D shows 
an example where groups Sr = ((<%) : {fa); (<%)}) and = : {(%), (03)}) of partition 

S = {{ a o)j Gh Gj, fa)} combine at (02). The modified partition is 

I ^(y) = {W,^,W}, where G* = ((03,07) : {fa), fa), fa), (o 5 )}). 

s Lemma 5 For any constant A > 0 ? the function /' (x ) = x log ( 1 + A / x ) is monotonically 
r; increasing in x for all x > 0. 

fy Proof: The 1st order derivative of f(x) is f'(x) = log (1 + A/x) - A / (x + A). Let u 
= A/x, g{u) = f ! {x)\ x = A/u = log (1 + - u/(u + l),thenu>0mdg(0)=0. The 

15 1st order derivative ofg(u) is d (u) = u/(u + l) 2 . For any u> 0, g ! {u) > 0 , thus g(u) > 0, 
So for any x > 0, f ! {%) > 0, that is,f(x) is monotonically increasing in x. □ 

Theorem 4 Let = {Gh • . f?m} be a partition of y under y) . Suppose that Gi G 7^>) can 
be combined with £7 G £p^) at £?o, where & is the 1 -level group at some node of 1{Gj). Let 
P*(30 be the resulting partition. Then El %*JX) ^ (Y) , 
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Proof: Let n 0 = J71, . . JM = *lpjM, so that n 0 's parent is n p . Define 

— {Jjh • -ji ' 1 < * < M (i.e. the set of nodes on the path to n 0 , excluding node J); 
S2 = {n G : 11 is the sibling of node s }, s G = (<S[ U { J}) n {r^} c (i.e. the set of nodes on 
the path to n 0 , excluding node n 0 ). For any node n £ TfpQ)), let Q n and q n denote the subtree 
and node probabilities respectively of node n in IPO)), and define AQ n =Q n ~g n = ^Q nj . 

Then Figure 16 shows the subtree probabilities associated with combining Gi with Gj at G 0 . Let 
the resulting new group be Q*. 

Note that the sum of the subtree probabilities of Gi and Gj equals the subtree probability 

of G e 9 and thus the optimal average rate of the groups in VQ) fl {Gi, Gjf are not changed by the 
— — — ^ — 

combination. Thus if (k,lj) and (Zj,Zj) are the optimal average rates for {Qi,Qj) in VQ) and 

^(iV) , respectively, then AZ/+ A/y = (Zj— Zj) gives the total rate cost of using partition 

VQ) rather than partition 7*09 . Here 



-1/ = Q/logQr+^Q/felog^ + AZ/ 
k=i A ®i 



Q/log 



nfceSj ^ ' k=l ^ ' "^ n ° 

Qi + Qj Qi + Qj jx Qi + Qn p Qi + Qn 0 \ 
, <3r + AQ j Q/ + AQjj-j ' ' ' Qj + AQ np Q, i 



AQj 



+AQr log 



LA 64435v9 



56 



where AZ7 represents the portion of the average rate unchanged by the combination of Gj 

and Gj. 



It follows that AZj > 0 since log]^{Qi+Qn)/{Qi+^Qn) >0, and since xlog^l +c/x) is 
monotonically increasing in x > 0 and c> 0 implies that 

^2rlog(l+^) <^log(l-h^) <Q/log(l+|) 

Similarly, using AZj as the portion of lj unchanged by the combination, 



AZj = Qjlog j + 2^ Qnfclog J. + 2^ Qnfelog 

+ E log ^^5- + E° } io g A<? - 
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Thus A/j>0 by the monotonicity of xlog(l +c/x). Since the optimal rates of ft and ft both 
decrease after combining, we have the desired result. 

Unfortunately, Theorem 4 does not hold for matched Huffman coding. Theorem 5 shows 
a result that does apply in Huffman coding. 

5 Theorem 5 Given partition V(y) of y on p&y), if Qh Qj 6 ^(^) satisfy: (1) ft is a 1 -level 
group and (2) ft can be combined with ft at root J of 1{Gj) to form partition (3^ ) , then 

M* jEl (fl) (Y) < (Y) 

y ^ 

y Proof: Let a denote the matched Huffman code for ^V), and use &i and aj to denote this 

S code's binary descriptions for nodes I and J . The binary description for any symbol in ft 

W equals a? (a(y) =a/ for each y G ft) while the binary description for any symbol in ft has prefix 

S (o(y) =ajd(y) for each y G ft ? where a 7 is a matched Huffman code for ft ). Let (\rin be the 

Q 

pj shorter of oq and olj . Since a is a matched Huffman code for VQ) and (J 7 ) is a partition of 
y on p(x,y), 

<w%) if y eg j 

o(y) otheruise 

15 is a matched code for V* (y) . Further, la^J < \cq\ and lo^J < |oy| imply that the expected 
length of c?(Y) is less than or equal to the expected length of oQf) (but perhaps greater than the 
expected length of the matched Huffman code for V* (3> ) ). 
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General Lossless Instantaneous MASCs: Problem Statement, Partition Pairs, and Optimal 
Matched Codes 



We here drop the side-information coding assumption that X (or 7) can be decoded 
independently and consider MASCs in the case where it may be necessary to decode the two 
5 symbol descriptions together. Here, the partition V{y) used in lossless side-information coding 
is replaced by a pair of partitions (V(X) 9 V(y) ). As in side-information coding, V{ X) and 
V(y) describe the prefix and equivalence relationships for descriptions {7x(x) : x e X } and 
ily{y) : y € y} 9 respectively. Given constraints on (V(X) 9 V(y)) that are both necessary and 
y sufficient to guarantee that a code with the prefix and equivalence relationships described by 
W ,*P(y)) yields an MASC that is both instantaneous and lossless, Theorem 1 generalizes 

•ST " 

y j easily to this coding scenario, so every general instantaneous lossless MASC can be described as 
8 a matched code on V{X) and a matched code on V(y) for some (P(X) 9 V{y) ) satisfying the 
appropriate constraints. 

1 y In considering partition pairs (P(X) ,V(y)) for use in lossless instantaneous MASCs, it 

1 5 is necessary but not sufficient that each be a legitimate partition for side information coding on 
its respective alphabet. (If 7^(3^) fails to uniquely describe Y when the decoder knows X 
exactly, then it must certainly fail for joint decoding as well The corresponding statement for 
V(X) also holds. These conditions are, however, insufficient in the general case, because 
complete knowledge of X may be required for decoding with V(y) and vice versa.) Necessary 
20 and sufficient conditions for (V(X) , V(y) ) to give an instantaneous MASC and necessary and 
sufficient conditions for (V(X) , V(y) ) to give a lossless MASC follow. 
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For (P(X) , V(y) ) to yield an instantaneous MASC, the decoder must recognize when it 
reaches the end of ^x(X) and 7y(Y) . The decoder proceeds as follows. We think of a matched 
code on V as a multi-stage descriptiowith each stage corresponding to a level in 7[V) . Starting at 
the roots of T(V(X)) and T(V(y)) 9 the decoder reads the first-stage descriptions of ix(X) and 
7y(y) , traversing the described paths from the roots to nodes n x and in partitions T(V{X) ) 
T(V(y)) respectively. (The decoder can determine that it has reached the end of a single stage 
description if and only if the matched code is itself instantaneous.) If either of the nodes reached 
is empty, then the decoder knows that it must read more of the description; thus we assume, 
without loss of generality, that n x and n y are not empty. Let T x and T y be the subtrees 
descending from n x and n y (including n x and n y respectively). (The subtree descending from 
a leaf node is simply that node.) For instantaneous coding, one of the following conditions must 
hold: 

(A) X e T x or ify is a leaf implies that Ye n y , and Ye T y or n x is a leaf implies that X e n x ; 

(B) XeT x implies that Y£ n y ; 

(C) Ye T y implies that X £ n x . 

Under condition (A), the decoder recognizes that it has reached the end of jx(X) and 
7 y( Y) . Under condition (B), the decoder recognizes that it has not reached the end of 7y(Y) 
and reads the next stage description, traversing the described path in 7 (PQ)) to node n^. with 

subtree T' . Condition (C) similarly leads to anew node rl x and subtree T' x . If none of these 
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conions holds, then the decoder cannot determine whether to continue reading one or both of the 
descriptions, and the code cannot be instantaneous. The decoder continues traversing 1 (P(fy) 
and 7 (VQ)) until it determines the 1 -level groups n x and n y with lEn^ and Ye n y . At each 
step before the decoding halts, one (or more) of the conditions (A), (B), and (C) must be 
satisfied. 

For (V(Xj 9 VQ)) to give a lossless MASC, for any (x , y) G X x y with p(x, y)>0 
following the above procedure on (7*0*0, 7s{J/)) must l ead to final nodes (n^ ) that satisfy: 

(D)(sc, y) e n^x n y and for any other x' G and y 1 G ny, p{x,y f ) = p(sc', y) = j/) = 0 

The following lemma gives a simplified test for determining whether partition pair (T\Xj 9 
V(y)) yields a lossless instantaneous MASC. We call this test the MASC prefix condition. 
Lemma 6 reduces to Lemma 1 when either V(X) = {{x} : x G A} or VQ) - {{y} :yey}. In 
either of these cases, the general MASC problem reduces to the side information problem of 
Section II. 

Lemma 6 Partition pair (P(X), VQ)) for p(x , y) yields a lossless instantaneous MASC if and 
only if for any x, x l G X such that {ix(x),lx{x f )} does not satisfy the prefix condition, 
ily{y) ' y e A X U A^} satisfies the prefix condition and for any y,y ! G y such that 
{ly{y)>lY{y f )} does not satisfy the prefix condition, {~{x{x) : x G B y UB y ,} satisfies the prefix 
condition. Here B y = {x G X \ p(x,y) > 0} . 
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Proof: First, we show that if lossless instantaneous MASC decoding fails, then the MASC prefix 
condition must be violated. If lossless instantaneous MASC decoding fails, then there must be a 
time in the decoding procedure, that we decode to nodes (n x -> with subtrees T x and Ty, but 
one of the following occurs: 

(1) none of the conditions (A), (B), or (C) is satisfied; 

(2) condition (A) is satisfied, but condition (D) is violated. 

In case (1), one of the following must happen: (a) the decoder determines that Ye n y , but 
cannot determine whether or not Xen x ; (b) the decoder determines that X e n x , but cannot 
determine whether or not Ye ; (c) the decoder cannot determine whether or not Ye ny or 
whether or not X e n A . If (a) occurs, then there must exist V, y ( e n y , x G n x , and x f eT^Dn^ 
with p(x,y)p(x' : y) > 0 or p(x 7 y)p{x l , y') > 0, which means x,x ! eB y UB yf . If (b) occurs, 
then there must exist x ,x f e n x , y e n y , and y* eT y C\xi y with p(x,y)p(x,y { ) > 0 or 
p(x, y)p(x\ j/) > 0, which means y,iJ e Ax U A^. If (c) occurs, then there must exist x e n x , 
x* eT x r\ri^ y e n y , and y' e Ty n ri° y with p(x,y)p(x\y f ) > 0 or p(x',y)p(x 7 y f ) > 0, which 

means y, y f e A X U A x ,. Thus in subcases (a), (b), and (c) of case (1) the MASC prefix condition 
is violated. 

In case (2), assume the true values of (X,Y) are (x,y) 9 then one of the following must 
occur: (a) we decode Y— y but cannot decode X; (b) we decode X = x but cannot decode Y ; 
(c) we can decode neither X nor Y . If (a) occurs, then there must exist an x f e n x with 
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p(x\ y) > 0, which means x,x f G B y . If (b) occurs, then there must exist a y f € n y with 
p(x,y f ) > 0 , which means £/ , 2/ 1 G i,. If (c) occurs, then there must exist cc ' G n ^ and 
^ G ity with y ; ) > 0 or p(x,y') > 0 or p(x f ,y) > 0, which means 

x,x ! G B y U £> y/ ox y ,y ! G *A X U *4 X ,. Thus in subcases (a), (b), and (c) of case (2) the 
MASC prefix condition is likewise violated. 

Next, we show that if the MASC prefix condition is violated, then we cannot achieve a 
lossless instantaneous MASC. Here we use n x and to denote the nodes of the partition tree 
satisfying xGn, and y G n y . We assume symbols x,x l G X and y, y ! G y satisfy 
y,y f G A x U A x , and x, x 1 G B y U By, but 7x(x) and 7x{ x ') do not satisfy the prefix 
condition, and ly{y) and 7y(y') do not satisfy the prefix condition; i.e. the MASC prefix 
condition is violated. Then one of the following must hold: 

(1) 7x0*0 = 7x(a0 and 7^) = 7r(y') ; 

(2) 7x(ac) = 7x(V) and 7^2/) is the prefix of 7y(j/ ) ; 

(3) yrfy) = 7y(y') and 7x(^) is the prefix of jx(% f ) ; 

(4) 7x(^) is the prefix of 7x(z') and iriv) is the prefix of 7v0/ ) • 

In case (1), there must be a time in the decoding procedure that the decoder stops at 
(n^ n y ) and determines that X G n x ; Ye However, since y' G u A^,, all of the 



LA 64429v9 



63 



following are possible given Xe n* and Ye A, : (a) y e A x n and y' € *A X , n A c x ; (b) 
j/ E A xf Pi ^ and y' € 4* n ,4^; (c) y, y' G ^ n A x ,. Thus the decoder cannot determine 
which of the following symbols was described: (z 3 y), (sc, ?/), (V, y) or (a; 7 , y ! ). 

In case (2), there must be a time in the decoding procedure that the decoder reaches 
(lie, %) and determines that X e n x . However, as in case (1), all of the three possibilities can 
happen, and the decoder does not have extra information to determine whether or not Ye 

In case (3), there must be a time in the decoding procedure that the decoder reaches 
(ris, n y )and determines that Ye r^. However, as in case (1), all of the three possibilities can 
happen, and the decoder does not have extra information to determine whether or not Xer^. 

In case (4), there must be a time in the decoding procedure, that the decoder reaches 
(n x , toy) and needs to determine whether or not Xe and whether or not Y e n y . However, 
again as in case (1), all of the three possibilities can happen, and the decoder does not have extra 
information to instantaneously decode. □ 

Optimality of a matched code for partition V(y) is independent of whether V{y) is used 
in a side-information code or an MASC. Thus our optimal matched code design methods from 
lossless side-information coding apply here as well, giving optimal matched Shannon, Huffman, 
and arithmetic codes for any partition pair (P(X) , V(y) ) for p(x^ y) that satisfies the MASC 
prefix condition. 
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Optimal Partition Properties 



Given a partition pair (P(X) , ^(J 7 ) ) that satisfies the MASC prefix condition, (V(X) , 
V(y)) is optimal for use in a matched Huffman MASC on p (x , y) if (£^ } (X), El { ^ y) {Y)) 

sits on the lower boundary of the rates achievable by a lossless MASC on alphabet X x y. 
Similarly, (V(X) 5 ) is optimal for use in a matched arithmetic MASC on p (x , y ) if 
( El* v ^(X ) , Ellp^ ( Y") ) sits on the lower boundary of the rates achievable by a lossless 

MASC on alphabet Af x . Again and Z* denote the Huffinan and optimal description 

lengths respectively for partition V , and Huffman coding is optimal over all codes on a fixed 
alphabet. (Mixed codes (e.g., Huffman coding on X and arithmetic coding on Y) are also 
possible within this framework.) While the lower convex hull of the rate region of interest is 
achievable through time sharing, we describe the lower boundary of achievable rates rather than 
the convex hull of that region in order to increase the richness of points that can be achieved 
without time sharing. This region describes points that minimize the rate needed to describe Y 
subject to a fixed constraint on the rate needed to describe X or vice versa. The regions are not 
identical since the curves they trace are not convex. Their convex hulls are, of course, identical 

Using Lemma 7, we again restrict our attention to partitions with no empty nodes except 
for the root. The proof of this result does not follow immediately from that of the corresponding 
result for side-information codes. By Lemma 6, whether or not two symbols can be combined 
for one alphabet is a function of the partition on the other alphabet. Thus we must here show not 
only that removing empty nodes does not increase the expected rate associated with the optimal 



LA 64429v9 



65 



code for a given partition but also that it does not further restrict the family of partitions allowed 
on the other alphabet. 

Lemma 7 For each partition pair (P(X) , V(y) ) that achieves performance on the lower 
boundary of the achievable rate region, there exists a partition pair (V* (X) ,V*(y)) achieving 
the same rate performance as (P(X) , V(y) ), for which every node except for the roots of V*(X) 
and V*{y) is non-empty and no node has exactly one child. 

Proof: Case 1 : If any non-root node n of partition V(X) is empty, then we remove n, so 
{n/c}^ descend directly from n f s parent. Case 2: If any node n has exactly one child nl, then 

we combine n and nl to form 1 -level group (n, nl) with {nlk}^^ descending directly from 
(n, nl). In both cases, the rate of the new partition does not increase and the prefix condition 
among V(X) 's non-empty nodes is unchanged, thus the symbols of y that can be combined 
likewise remains the same by Lemma 6. 

Partition Design 

By Lemma 6, whether or not two symbols can be combined in a general MASC is a 
function of the partition on the other alphabet. Fixing one partition before designing the other 
allows us to fix which symbols of the second alphabet can and cannot be combined and thereby 
simplifies the search for legitimate partitions on the second alphabet. In the discussion that 
follows, we fix V(X) and then use a variation on the partition search algorithm of lossless side- 
information coding to find the best V(y) for which (V{X) , V(y) ) yields an instantaneous 
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lossless MASC. Traversing all V(X) allows us to find all partitions with performances on the 
lower boundary of the achievable rate region. 

To simplify the discussion that follows, we modify the terminology used in lossless side- 
information coding to restrict our attention from all partitions on y to only those partitions V(y) 
5 for which (P(X) , V(y) ) satisfies the MASC prefix condition given a fixed V(X) . In particular, 
using Lemma 6, symbols y and y 9 can be combined given P(X) if and only if there does not exist 
an x, x 1 G X such that 7x(x) < 7*(x') and y, y G A x U A x , t (Here 7x(x) is any matched code 
N 8 for 'P(A) .) Equivalently, j; and can be combined given if for each pair x, x' G X such 
y that 7x(x) -< 7x(x / ), y f ) + p(^- J y / )) = 0. Given this new definition, 

ftj the corresponding definitions for M-level groups, partitions on y, and matched codes for 

yj 

yi partitions on 3^ for a fixed P(A^ follow immediately. 

UJ Next consider the search for the optimal partition on y given a fixed partition V(X) . We 

O use V*(y\V(X)) to denote this partition. The procedure used to search for V*(y\V(X)) is 
almost identical to the procedure used to search for the optimal partition in side-information 

15 coding. First, we determine which symbols from y can be combined given *P{X) . In this case, 
for each node n E 7[V(X)) , if T n is the subtree of 7 (V(X) ) with root n, then for each 
n' G T nk with k G {l , . . . , K(n)} , symbols y, y f e A n U A n f cannot be combined given 
V(X) . Here A n — {y : y e A x , x G n} . Traversing the tree from top to bottom yields the full 
list of pairs of symbols that cannot be combined given V(X) . All pairs not on this list can be 

20 combined given V(X) . Given this list, we construct a list of groups and recursively build the 
optimal partition V*(y\V(X)) using the approach described in an earlier section. 



LA 64429v9 



67 



Given a method for finding the optimal partition V* ( y \ V ( X) ) for a fixed partition 
V{ X) 5 we next need a means of listing all partitions V(X), (Note that we really wish to list all 
T > {X) 9 not only those that would be optimal for side-information coding. As a result, the 
procedure for constructing the list of groups is slightly different from that in lossless side- 
5 information coding.) For any alphabet X 1 C X 9 the procedure begins by making a list C x of 
all (single- or multi-level) groups that may appear in a partition of X 1 for p(x, y) satisfying 
Lemma 7 (i.e. every node except for the root is non-empty, and if (n)^l). The list is initialized 
as C x > = {(x) : x G X f } . For each symbol x G X ! and each non-empty subset 
y , S C { z e X : z can be combined with x under p(x, y) }, we find the set of partitions {^(^) } of 
t& <S for p{x , y) ; for each P(<S) ? we add jc to the empty root of T(V{S)) if P(<S) contains more than 
0 1 one group or to the root of the single group in V{S) otherwise; then we add the resulting new 

S .. r. 

group to £< x' if ^ a ,/ does not yet contain the same group. 

s 

j g § After constructing the above list of groups, we build a collection of partitions of X ' 

F| made of groups on that list. If any group Q ^ C x , contains all of the elements of X f , then {Q} 
15 is a complete partition. Otherwise, the algorithm systematically builds a partition, adding one 
group at a time from C# to set V {PC') until V(X') is a complete partition. For Q € £^/tobe 
added to V(X) , it must satisfy Q n ^ = 0 for all <7' G P(X') . The collection of partitions for 
X'is named £>-p(x f ) ■ 



We construct the optimal partition V*(y\V(X)) for eachP ( Af ) G C V ( X ) and choose 
20 those partition pairs (V(X) 9 V(y) ) that minimize the expected rate needed to describe Y given a 
fixed constraint on the expected rate needed to describe X (or vice versa). 
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Near-Lossless Instantaneous Multiple Access Source C oding: Problem Statement, 
Partition Pairs, and Optimal Matched Codes 

Finally, we generalize the MASC problem from lossless instantaneous side-information 
and general MASCs to near-lossless instantaneous side-information and general MASCs. For 
5 any fixed e > 0 , we call MASC ((7x, 7y) , 7 _1 ) a near-lossless instantaneous MASC for P e < e if 
((7x, 1y) , 7 _1 ) yields instantaneous decoding with P e = Pr(7 -1 (7*(*), 7*00) ^(*» *0) < e. 
For instantaneous decoding in a near-lossless MASC, we require that for any input sequences 

0 x h x 2 , x 3 , . . . and yi, y 2 , Jfe, • • • with p{x h yi) > 0 the instantaneous decoder reconstructs some 
S reproduction of (xi, yi) by reading no more and no less than the first bx(asi) I bits from 

M 7x(xi)7x(x2)7x(x 3 )- and the first \jy(yi)\ bits from 7^2/1)7^2)7^3). ■ ■ (without prior 

S knowledge of these lengths). That is, we require that the decoder correctly determines the length 

E of the description of each (x, y) with p(x, y) > 0 even when it incorrectly reconstructs the values 

9 of x and y. This requirement disallows decoding error propagation problems caused by loss of 

: 5 

1 y synchronization at the decoder. 

1 5 Theorem 6 gives the near-lossless MASC prefix property. Recall that the notation 

7v<y) -<7*{l0 means that 7^/) is a proper prefix of 7v(y') , disallowing 7r(y) = 1y{v') . 

Theorem 6 Partition pair (H^TO) can be used in a near-lossless instantaneous MASC on 
p(x,y) if and only if both of the following properties are satisfied: 

(A) for any x,x' € X such that Txfc) ^7x«h {t^) :ye AuA'} is prefix free; 
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(B) for any x ? z' G A such that 7x(x) = 7x(x') , {7y{?/) :^AU A'} is free of proper-prefixes. 



Proof: If either condition (A) or condition (B) is not satisfied, then there exist symbols x, x f G X 
and y, 2/' G X such that y,y f € A x U A x < , md one of the following is true: 
(1) lx (x) = 7 X (xO and 7y(j0 ^ 7r0/'); (2) 7i<3/) = 7v(y') and 7x0*0 -< 1x{*')\ 
5 (3) 7x(s) -< lx{% f ) and 7y(y ) -< 7y(2/') . In any of these cases, the decoder cannot 

determine where to stop decoding one or both of the binary descriptions by an argument like that 
in Lemma 6. The result is a code that is not instantaneous. 

01 For the decoder to be unable to recognize when it has reached the end of lx{X) and 

W 7y(Y) , one of the fbllowings must occur: (1) the decoder determines that X G n*, but cannot 
l|f determine whether or not Y G n y ; (2) the decoder determines that Y G tiy, but cannot determine 
M= whether or not X G n*; (3) the decoder cannot determine whether or not X G or Y G n^. 
O Following the argument used Lemma 6, each of these cases leads to a violation of either (A) or 
fU (B) (or both). 

Thus the near-lossless prefix property differs from the lossless prefix property only in 
15 allowing 7x0) = lx{^) and jy{y) = tvCs/) when !/,^AUi l/( In near-lossless side 
information coding of Y given X this condition simplifies as follows. For any y, y' G y for 
which there exists an x G A with p(x, y)p(x } y f ) > 0, 7y(y) -« jy(y f ) is disallowed (as in 
lossless coding) but jy{y) = 7i{y') is allowed (this was disallowed in lossless coding). In this 
case, giving y and y f descriptions jy{y) -< Jy{y f ) would leave the decoder no means of 
20 determining whether to decode \jy{y)\ bits or 1 7^') I bits. (The decoder knows only the value 
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of x and both p(x, y) and p(x, y') are nonzero.) Giving y and y' descriptions jy{y) = Jviy') 
allows instantaneous (but not error free) decoding; and the decoder decodes to the symbol with 
the given description that maximizes p(- \x) . In the more general case, if , Q^ y) ) are the 
1 -level groups described by (txPQ, TyOO), the above conditions allow instantaneous decoding 
of the description of and . A decoding error occurs if and only if there is more than 
one pair of (x,y) G x QO>) w i t h p(x, y) > 0 . In this case, the decoder reconstructs the 
symbols as arg max (x y)6gWxgW p(x, y). 

Decoding Error Probability and Distortion Analysis 

As discussed above, the benefit of near-lossless coding is a potential savings in rate. The 
cost of that improvement is the associated error penalty, which we quantify here. 

By Lemma 6, any 1 -level group Q Q y is a legitimate group in near-lossless 
side-information coding of Y given X. The minimal penalty for a code with jy{y) = ydtf) for 
all y,y' G £ is 

p e($) = S* y) - y) ] ■ 

This minimal error penalty is achieved by decoding the description of Q to 
y = argmax^p^y') when X = x. Multi-level group Q = (ft : C(ft)) is a legitimate 
group for side-information coding of Y given X if and only if for any x G X and y G TZ, 
y' G C(ft) implies p(x,y)p(x,y') = 0. In this case, 
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That is, the error penalty of a multi-level group equals the sum of the error penalties of 
the 1-level groups it contains. Thus for any partition V(y) satisfying the near-lossless MASC 
prefix property, 

Similarly, given a partition V(X), a 1-level group Q C y is a legitimate group for a general 
near-lossless MASC given V{X) if for any y,ij e G 9 y and t/ do not both belong to ArUA^ for 
any x, x' such that 7x(x) ^< A multi-level group G = (&: C(R)) on y is a legitimate group 

for a general near-lossless MASC if H and all members of C(Jl)) are legitimate, and for any 
y e U and y f € C(7£) , y and do not both belong to Ac U for any x , x ' such that x is a 
prefix of x f . 

For any pair of nodes e ^(^0) and n y e T{V(y)), the minimal penalty for (n*, ny) is 
Pe(n^n y ) = {X:y) ^ xXny P&y) - (x ^Z^yP( x ^y 

Decoding the description of n x and n y to arg max x€n ^ yeny {p(x, y)} gives this minimal 

error penalty. Thus the minimal penalty for using partition pair (T\fy, satisfying the 
near-lossless MASC prefix property is 
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Pe(v(X),v(y))=j: 



Pe(jl X , liy). 



Since near-lossless coding may be of most interest for use in lossy coding, probability of 
error may not always be the most useful measure of performance in a near-lossless code. In 
lossy codes, the increase in distortion caused by decoding errors more directly measures the 
impact of the error. We next quantify this impact for a fixed distortion measure d(a, a) > 0. If d 
is the Hamming distortion, then the distortion analysis is identical to the error probability 
analysis. 

In side information coding of Y given X, the minimal distortion penalty for 1 -level group 



£is 




z,y)d(y, y). 
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This value is achieved when the description of Q is decoded to 



arg mm 



Yi V zQV( x >v) d (y> v) when X = x. Thus for any partition TO) satisfying the 



near-lossless MASC prefix property, the distortion penalty associated with using this 



near-lossless code rather than a lossless code is 



D(P(y))= X 2>(n) 



In general near-lossless MASC coding, the corresponding distortion penalty for any 
partition (P(fy,VQ)) that satisfies the near-lossless MASC prefix property is 



n x eT(V(X)) n y eT(V(y)) * er W e S xen x ,yen y 

Partition Design 

In near-lossless coding, any combination of symbols creates a legitimate 1 -level group Q 
(with some associated error P e (Q) or Di$))- Thus one way to approach near-lossless MASC 
design is to consider all combinations of 1 -level groups that yield an error within the allowed 
error limits, in each case design the optimal lossless code for the reduced alphabet that treats 
each such 1-level group Q as a single symbol xg (x g £ X if \$[ > 1) or yg (y g £ y if \Q[ > 1), 
and finally choose the combination of groups that yields the lowest expected rates. Considering 
all combinations of groups that meet the error criterion guarantees an optimal solution since any 
near-lossless MASC can be described as a lossless MASC on a reduced alphabet that represents 
each lossy 1-level group by a single symbol. 





mm 
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For example, given a 1 -level group G = (xi, . . x m ) C X y we can design a near-lossless 
MASC with error probability P e (G) by designing a lossless MASC for alphabets 
~X = XD {x b . . x m } c U {x g } and y andp.m.f. 



Thus designing a near-lossless MASC for p(x,y) that uses only one lossy group G is 
equivalent to designing a lossless MASC for the probability distribution p(x,y), where the matrix 
describing p(x,y) can be achieved by removing from the matrix describing p(x 9 y) the rows for 
symbols £i, . . x m G £ and adding a row for x Q . The row associated with x Q equals the sum of 
the rows removed. Similarly, building a near-lossless MASC using 1 -level group G C >' is 
equivalent to building a lossless MASC for a p.m.f. in which we remove the columns for all 
y £G and include a column that equals the sum of those columns. 

Multiple (non-overlapping) 1 -level groups in X or y can be treated similarly. In using 
groups Gh G2 C X, the error probability adds, but in using groups Gx £ A; and 5y Q y the effect on 
the error probability is not necessarily additive. For example, if Gx = (24, • • ■> x m ) and 
&y = (yi, . . t/fc) then the error penalty is 




if x € Xf] X 

if X = Xg 



Pe{Gx. Gy) = X) ( £ y) ~ maxp(x, y) + £ £ p(x, y) - maxp(x, y) 
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where R = {x h . . ., x m } and C= {y h . . ., y k }. Since using just G x gives 



and using just (/y gives 



we have 

Pe^Ar,^) = Pe(^) + Pe(Gy) ' 6{GxM, 



where 



is not necessarily equal to zero. Generalizing the above results to multiple groups - . 
and G^i, . . £y,# corresponding to row and column sets {R 1; Ro,..., Rm) and {Q, Q, . . ., &/<} 
respectively gives total error penalty 
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P e {{Gx,i,Qx,2, Gx,m}> {Gy,i> £y.2> • • • > Gy,K}) 

M K M K 

= E Pe@ x ,d + E p.{Gyj) -EE W** Oyj)- ( 10 ) 

i=l j=i i=i i=i 

Here 

M K 



55 Using these results, we give our code design algorithm as follows. 

Jj In near-lossless coding of source X given side information Y, we first make a list £*,e of 

O all lossy 1 -level groups of X that result in error at most e (the given constraint). (The earlier 
W described lossless MASC design algorithm will find all zero-error 1 -level groups.) Then a subset 
Jrj Sx,e of £# j€ such that S X e is non-overlapping and result in error at most e is a combination of 
10 lossy 1 -level groups with total error at most e . For each Sx,e 3 obtain the reduced alphabet X 
and p.m.f p(x 9 y) by representing each group G € S XiG by a single symbol x g as we described 
earlier. Then perform lossless side information code design of X on p(x,y). After all subsets 
S% € are traversed, we can find the lowest rate for coding X that results in error at most e . 
Near-lossless coding of Y with side information -X can be performed in a similar fashion. 

15 To design general near-lossless MASCs of both X and Y, we first make a list £*; € of all 

1 -level groups of X that result in error at most € , and a list Cy i€ of all 1 -level groups of y that 
result in error at most e . (We include zero-error 1 -level groups here, since using two zero-error 
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1 -level groups Gx £ % and Gy^y together may result in non-zero error penalty.) Second, we 
make a list £>s x =® U {S Xi€ C £*, e :<S*, e is non-overlapping, P e {Sx.e) < e} of all combinations 
of 1 -level groups of X that yield an error at most e , and a list As^S u {^c £ £y,e.Sy,e is 
non-overlapping, Pe(Sy,e) < e} of all combinations of 1 -level groups of J that yield an error at 
most e . (We include 0 in the lists to include side information coding in general coding.) Then 
for each pair (<S^<%), we calculate the corresponding 5 value and the total error penalty using 
formula (9) and (10). If the total error penalty is no more than e , we obtain the reduced alphabet 
X , y and p.m.f. p(x, y) described by <%), then perform lossless MASC design on p(x, y). 
After all pairs of (<%,<%) e C Sxfi x ^s ye are traversed, we can trace out the lower boundary 
of the achievable rate region. 

An Alternative Algorithm Embodiment 

We next describe an alternative method of code design. The following notation is useful 
to the description of that algorithm. 

The approach described below assumes a known collection of decisions on which 
symbols of y can be combined. If we are designing a side-information code, these decisions 
arise from the assumption that source X is known perfectly to the decoder and thus the conditions 
described in the section -"Lossless Side-Information Coding" apply. If we are designing a code 
for Y given an existing code for X, these conditions arise from the MASC prefix condition in 
Lemma 6. 
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The algorithm also relies on an ordering of the alphabet y denoted by y = { y h , 

y N } Here N=\y | is the number of symbols in y , and for any 1 <i <j <N , symbol y t is 
placed before symbol y } in the chosen ordering. Any ordering of the original alphabet is 
allowed. The ordering choice restricts the family of codes that can be designed. In particular, the 
constraints imposed by the ordering are as follows: 

1 . Two symbols can be combined into a one-level group if and only if 

(a) they are combinable 

(b) they hold adjacent positions in the ordering. 

2. A one-level group can be combined with the root of a distinct (one- or multi-level) group 
if and only if 

(a) the combination meets the conditions for combinability 

(b) the groups hold adjacent positions in the ordering. 

3. Two (one- or multi-level) groups can be made descendants of a single root if and only if 
the groups hold adjacent positions in the ordering. 
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4. 



The group formed by combining two symbols or two groups occupies the position 
associated with those symbols or groups in the alphabet ordering. Given that only 
adjacent symbols can be combined, there is no ambiguity in the position of a group. 



We discuss methods for choosing the ordering below. 

Finally, we define a function / used in the code design. For any i<j , let 
p [ z ' A = ^k-iPriyk ) 311(1 J] denote the group that occupies positions from i toy. When the 
algorithms begins, only Q[i, i] are defined, with Q[i, i] = (y,) for each / e {I, 2, N). The 
values of Q[i, j] for each i < j are set as the algorithm runs. The value of Q[l, N] when the 
algorithm is completed is the desired code on the full alphabet. For any p e (0,1), let 
H{p,\ -p) = -/?log p -(1- j p)Iog(l - p) . Finally, for any i<j < k, let cfij, k] be defined as 
follows. 



■[hj,k] = 



0 if w[i, j] - 0 and Q[i, j] can be combined with the root of Q [j + 1, k] 

1 ifw[z",7']>0, w[j' + l,A:] = 0, andQ[f+l,k] can be combined wim the root of Q[i,j] 

2 otherwise 



The value of c[i,j,k] describes if the two adjacent groups and Q[j + l, k] must 

be siblings under an empty root (when c[i, j, k] = 2 ) or one group can reside at the root of the 
other group ( when c[i, j, k] = 0, Q[i, j] can reside at the root of Q[j + \, k] ; when c[i, j, k] = 1, 
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Q [j + 1, k] can reside at the root of Q [i, j] ). We cannot calculate c[i, j , k] until 5 [*', j] and 
£ [y + 1, £] have been calculated. 

The value of /( w[z, 7], w[y + 1, fc] ) is the rate of group Q[i 9 k] when we use groups 
Q[ij] and + k] to construct Q[i, A:]. When Q[i,j] can reside at the root of Q[j + l, k], 
f{ ™[h A w\j + 1? k] ) equals Q [j + 1, k] 's best rate; when Q [j + 1, k] can reside at the root of 
G[i,j], f{w[uj\w\j + l 9 k]) equals Q[i, j] 's best rate; when <7[i*,y] and + k] must be 
siblings, /( w[i, j\ w\j + 1, k] ) equals 7, £]. The best rate of £[z, k] is the minimal value of 

/( A W V + l 'k]) over all j e { i, i+1 , i+L-1 } . The function /( w[i, j\ w\j + 1, k] ) is 

calculated as follows: 



f(w[ij\w[j^k\y 



w\ 
w\i 
w 



[/ + !»*] 



if 4',;,£]=0 
if c[z',y',£] = l 
if c[z, ./',&] =2 



Here, 



w°[i,j,k]= 



Ahj]+w[j + \,k]+P[i,k] 
w[i,j]+w\j + l,k]+P[i,k]H 



P[hj] P[j+Ul 
p[i,k]' P[i,k] , 



in Huffinan coding 
in arithmetic coding 



Given the above definitions, we use the following algorithm for code design. 
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1. 



Choose an order for alphabet y . In this step, we simply choose one of the | y |!/2 distinct 
orderings of the symbols in y . (An ordering and its reversal are identical for our 
purposes.) 



2. Initialize^, i]=0md Q[i, i] = (y f ) for alii e {1, 2, N} 



3. For each L e {I, 2, N-l} 



a. For each i e {1, 2, N-L}, set 



w[z,z + L]= min . e{ . . +U .., +I _ 1} /( w[z, j\ w[j + 1, z + 1] ) 



b. Let ;* = argmin. e{ . /+1) .., +i _ l} /( w[z, 7], w[/ + l,z + 1] ) , then set 



£ [i, 7 *] combined with the root of Q [j * +1, i + L] if c [i, j* 9 i + L] = 0 

£ [z, i + Z] = |^[7*+l,i + i] combined with the root of £ [z, y *] if c [z, y*, z + 1] = 1 

£/[z,7*] and ^[7 *+!,/ + 1] siblings under empty root if c[i 9 j* 9 i + L] = 2 



When the above procedure is complete, Q[l, N] is an optimal code subject to the 

constraints imposed by ordering { y h y 2 , , jW and w/7, N] gives its expected description 

length. 

Figure 13 illustrates the process in the alternate algorithm embodiment. At box 1301, an 
ordering of the alphabet is fixed. Then at box 1302, the variables (weight, group, etc) are 
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initialized. At box 1303, L is set to 1. At box 1304, i is set to 1. L and i are counter variables for 
the loop starting at box 1305, which iterates through the ordering and progressively creates larger 
combination out of adjacent groups until an optimal code for the ordering is obtained. At box 
1305, the current combination (i, j, i+L) is checked for combinability. The function/ for the 
combination is also determined at this point. At box 1306 the weight and grouping of the current 
combination are determined. At box 1307, it is determined whether i<N-L . If it is then the 
process increments i at 1310 and returns to box 1305. If not, it proceeds to box 1308 where a 
determination of whether L<N-\ is made. If it is then the process increments L and returns 
to box 1304. If not, the loop is complete and the process terminates at 1309. The optimal code 
and rate have been obtained. 

The algorithm may be used in a number of different ways. 

1 . The code designer may simply fix the ordering, either to a choice that is believed to be 
good or to a randomly chosen value, and simply use the code designed for that order. For 
example, since only symbols that are adjacent can be combined, the designer may choose 
an ordering that gives adjacent positions to many of the combinable symbols. 

2. Alternatively, the designer may consider multiple orderings, finding the optimal code for 
each ordering and finally using the ordering that gives the best expected performance. 

3. The designer may also choose a first ordering 0 1 at random, find the best code Q(0 } ) for 

this ordering; then for each me {1,2, M}, the designer could permute ordering O m 

using one or more of the permutation operations described below to find an ordering 
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O m+l ; for the given permutation operations, G(G m+l ) is guaranteed to be at least as good 
as G(O m ), since O m+l is consistent with G(P m ) . This solution involves running the 
design algorithm M+l times. The value of Mean be chosen to balance performance and 
complexity concerns. Here we list four methods to derive a new ordering from an old 
ordering, such that the new ordering's performance is promised to be at least as good as 
the old ordering. Suppose the old ordering O m is {y } , ,y N }. 

(a) Let Q[i 9 j], G\j+l 9 k] (i <j < k ) be any two subtrees descending from the same parent 

in G(O m ) . The new ordering O m+l is { y h ,y x . h y J+} , ,y k , yu ,yj, 

yk+i, Jn}- 

(b) Let j] be the root of subtree C/[i, k] (f <j < k) in G(O m ) . The new ordering O m+l 

is {yi f >yt-i> yj+u >y*>yi> >yj> yt*i, ^n) - 

(c) Let 7£[i, j] be the root of subtree <7[k, j] (k<i <j ) in ^(O m ) . The new ordering O m+l 
is \yi, ,yk-i, yu ^ , • 

(d) Suppose the subroot j] in G(O m ) is a one-level group with more than one 
symbol. Any permutation on the sub-ordering {y h ,yj} results in a new ordering. 

4. Any combination of random choices and permutations of the ordering can be used. 
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5. The designer may also be willing to try all orderings to find the optimal code. 

Here we note that trying all orderings guarantees the optimal performance. Choosing a 
sequence of orders at random gives performance approaching the optimal performance in 
probability. 

Huffman Coding Example for the New Algorithm 

Table 5 gives another example of the joint probability of source X and 7, with 
X = y = [a x , a 2 , a 3 , a A , a 5 } . Suppose X is given as side-information, we now find the optimal 

Huffman code for Y subject to the constraints imposed by ordering {a x , a 2 , a 3 , a 4 , a 5 }ony. 
Table 5 



X\Y 


ai 


a 2 




0,4 


a 5 


ai 


0.1 


0.0 


0.0 


0.0 


0.0 


a 2 


0.0 


0.1 


0.0 


0.0 


0.1 


a s 


0.0 


0.0 


0.1 


0.15 


0.2 


(14 


0.05 


0.0 


0.0 


0.0 


0.0 


a* 


0.0 


0.0 


0.05 


0.1 


0.05 



Initialize: w[i,i] = 0, G[i,i] = (a i ), i e {!,..., 5}. 
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L = 1 : aj and a 2 are combinable, so 2] = 0 , 5 [l, 2] = (a, , a 2 ) ; 

a 2 and are combinable, so w [2,3] = 0,g[2, 3] = (a 2 ,a 3 ); 

a 3 and a ¥ are not combinable, so m{3,4] = P[3,4] = 0.4, £[3, 4] = (( ):{(o,),(a 4 )}); 

a 4 and a 5 are not combinable, so v{a, 5] = P[4, 5] = 0.6, <?[4, 5] = (( ) :{(a 4 ),(a 5 )}) . 



5 L = 2 : 



i = 1 : c[l, 1, 3] = 0 (since w[l, l] = 0 and <?[l, l] = (a, ) can be combined with the root 
of Q [2, 3] = (a 2 , a 3 ) ), so l], w[2, 3j = 0, which is the minimal value. 



Saw 

s _ Thus w[l, 3] = 0 , Q[l, 3] = (a,, ,a 3 ) ; 

S z' = 2: c[2, 2,4] = 0 (since w[2,2]=0 and£[2, 2] = (a 2 ) can be combined with the root 

IB of Q[3, 4] = (( ):{(a 3 ),(a 4 )})),so /[w[2, 2j vtfc 4] = w[3, 4] = 0.4 ; 

c [2, 3, 4] = 2 , (since w[2,3] = 0 but 3] = (a 2 ,a 3 ) can't be combined with 
the root of - ^[4, 4] = (a 4 )), so /[w[2, 3], w[4, 4] = w° [2, 3, 4] = w[2, 3]+P[l, 4] = 0.5 . 

So, w[2,4]=0.4, g[2,4] = ((a 2 ):{(a 3 ), (a 4 )}). 
i = 3: c[3, 3,5] = 2, /[m{3, 3], w[4, 5j = w° [3, 3, 5] = w[4,5]+P[3,5] = 1.35 ; 
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c[3, 4, 5] = 2 , (since w[3, 4] > 0 , w[ 5 > 5 ] = 0 , but £[5,5] = (a 5 ) can't be 
combined with the root of Q [3, 4] = ) : |(a 3 ) , {a A )} j ), 
/[w[3, 4], w[5, 5 ] = w° [3, 4, 5] = wp, 4] + P[3, 5] = 1 . 1 5 . 

So, 43,5] = U5, Q[X 5] = (( ):{(( ):{(«,), (a 4 )}), (a,)}). 

1 = 3, 

/ = 1: c[l, l,4] = 0,/[w[l,ll42,4ll = w[2,4]=0.4; 
c[l, 2, 4] = 0 , 2], w[3, 4] = w[3, 4] = 0.4 ; 
c[l, 3,4] = 2, /[w[l,3] )W [4,4]l = w o [l,3,4]= 1 v[l,3]+P[l,4] = 0.65. 

So, w[l,4]=0.4, G[\, 4] = ((c 1 ,a 2 ):{(a 3 ) 5 (a 4 )}). 
z = 2 : c[2, 2, 5] = 2 , /[wfc 2], w[3, 5] = w° [2, 2, 5] = w[3, 5]+ P[2, 5] = 2 ; 

c[2, 3,5] = 2, /[h{2,3],w[4,51 = k; 0 [2,3,5]=>v[4,5]+P[2,5] = 1.45; 
c[2, 4,5] = 2, /[m{2,4Jw[5,5]] = w 0 [2,4,5]=w[2,4] + P[2,5] = 1.25. 

So, 42,5] = 1.25, 0[2, 5] = (( ):{((a 2 ):{(a 3 ), (a 4 )}), (a 5 )}). 

1 = 4 : 

/ = 1: c[l, 1,5] = 0, /[4l,llw[2,5]l = w[2,5]=1.25; 
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c[l, 2, 5] = 2, /[w[1,2W3»5]1 = w 0 [1,2,5] = m{3,5] 
c[l, 3, 5] = 2, /MuH^l^^ 
c[l, 4, 5] = 2, /MmH^I^^ 

So, >v[l,5]=1.25, 5] = ((n l ):{((a 2 ):{(a,), (*«)}), (a,)}). 

Thus the optimal Huffman code subject to the constraints imposed by ordering 
{a l9 a 2 ,a„a 49 a s }<m y is 5] = ((a 1 ):{((a 2 ):{(a 3 ) 5 (a 4 )}), (a 5 )}) , with rate w{l,5]=L25 
bits. 

Experimental Results 

This section shows optimal coding rates for lossless side-information MASCs, lossless 
general MASCs, and near-lossless general MASCs for the example of Table 3. We achieve 
these results by building the optimal partitions and matched codes for each scenario, as discussed 
in earlier sections. Both Huffman and arithmetic coding rates are included. 

Table 6 below gives the side-information results for the example of Table 3. 
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Table 6 



H(X) 


Rh{X) 


if on 






Rm(Y) 






2.91412 


2.97 


2.91075 


1.67976 


1.53582 


2.96 


1.75 


1.67 



Here H(X) and Rh{X) are the optimal and Huffman rate for source X when X is coded 
independently. We use [ H W>& SI ^^ A W and M^^^)] to denote the 

optimal and Huffman results respectively for [traditional, side-information from results from 
Jabri and Al-Issa and our side-information] coding on Y. The partition trees achieving these 
results are shown in Figure 14. The rate achievable in coding Y using side-information X is 
approximately half that of an ordinary Huffman code and 90% that of result from [2], 

Figure 15 shows general lossless and lossy MASC results. The optimal lossless MASC 
gives significant performance improvement with respect to independent coding of X and Y but 
does not achieve the Slepian-Wolf region. By allowing error probability 0.01 (which equals 
mii^Kx, V)> i- e - the smallest error probability that may result in different rate region than in 
lossless coding), the achievable rate region is greatly improved over lossless coding, showing the 
benefits of near-lossless coding. By allowing error probability 0.04, we get approximately to the 
Slepian-Wolf region for this example. 

For the joint probability distribution given in Table 3 of the "Invention Operation" 
section, we perform the alternative algorithm embodiment (described in the last section) on 
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several orderings of the alphabet y = { a 0 , a x , a 2 , a 3 , a 4 , a 5 , a 6 , a 7 } (considering X is given as side- 
information). 

For Huffman coding, many orderings can achieve the optimal performance 
(R* SI H (Y) = 1.67), for example, orderings (a 0 , dj, a 3 , a 6 , a 2 , a 4 , a 5 , a 7 ), (a 3 , a& Q>o, a 4 , a 2 , 

a 5 , a h a 7 ), (a 4 , a 6 , a 0 , a h a 3 , a 5 , a 7 , a 2 ), (a 7f a 2 , a 3 , a 5 , a 4 , a 6 , a h a 0 ), etc, etc. These 
are just a few examples. 

For arithmetic coding, again, many orderings can achieve the optimal performance 
(7) = 1.53582), for example, orderings (a 0 , a 4 , a h a 5 , a 2 , a 7 , a 3 , a 6 ), (a h a 5 , a 2 , a 0) 

a 4 , a 7 , a 6 , a 3 ), (a 5 , a If a 2) a 4 , a 0 , a 7f a 6> a 3 ), (a 6 , a 3 , a 4 , a 0 , a 2 , a 5 , a h a 7 ) y etc, etc. 
These are just a few examples. 

Table 7 below gives examples of a few randomly chosen orderings' Huffman code rates 
and arithmetic code rates. 
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Table 7 





Ordering 


Huffman code 
rate 


Arithmetic code rate 




(a 0 , a h a 2 , a 3 , a 4 , a 5 , a 6 , a 7 ) 


2.38 


2.1709 




(a,j, a 2 , a 4 , a 6 , a 0 , a 5 , aj, a 7 ) 


1.98 


1.93391 




(a 4 , a 2 , a 0 , a 3 , a h a 7 , a 6 , a 5 ) 


2.35 


2.04459 




(a^, CI4, as, a 1, aj, a 2 , a 3 , ao) 


2.14 


1.88941 


f r. 1 

d 


(a 6 , a 4 , a 3 , a 5 , a h a 7 , a 0 , a 2 ) 


1.85 


1.69265 




(a 7 , a i, a 0 , a 3 , a 4> a 6 , a 2 , a 5 ) 


1.8 


1.77697 



\ y 

Thus, an implementation of lossless and near-lossless source coding for multiple access 
5 networks is described in conjunction with one or more specific embodiments. The invention is 
defined by the claims and their full scope of equivalents. 

The paper titled "Lossless and Near-Lossless Source Coding for Multiple Access 
Networks" by the inventors is attached as Appendix A. 
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